<html>
<head>
  <title>Statistics - Cybersecurity - Topics in Statistics - Telematic support to students</title>
  <style type="text/css">
      
             iframe{ 
        display:block;
 -moz-transform-origin: top left; 
 -webkit-transform-origin: top left; 
 -o-transform-origin: top left; 
 -ms-transform-origin: top left; 
 transform-origin: top left; 
-webkit-transform:scale(0.4);
-moz-transform-scale(0.4);
       border: none;
    margin: 0;
    padding: 0;
   margin-bottom:-400px;
  margin-right:-15px;
 } 
           
      
        .style1 {
                font-size: x-large;
        }
        
        .style3 {
                font-size: larger;
        }
        
        .style4 {
                font-size: large;
        }
        
        .style5 {
                color: #009900;
        }
        
        .style6 {
                color: #FF3300;
                font-weight: bold;
        }
        
        .style7 {
                font-size: smaller;
        }
        
        .style9 {
                color: #FF0000;
        }
        
        .style10
      {
          font-family: Verdana;
      }
      .style11
      {
          font-family: Verdana;
          font-weight: bold;
          text-decoration: underline;
      }
      .style12
      {
          font-family: Verdana;
          font-size: medium;
      }
              
        .style14
      {
          font-family: Verdana;
          font-weight: bold;
          font-size: medium;
      }
      .style15
      {
          color: #FF3300;
          font-weight: bold;
          font-family: Verdana;
          font-size: medium;
      }
      .style16
      {
          font-size: medium;
      }
      .style17
      {
          font-family: Verdana;
          font-weight: bold;
          text-decoration: underline;
          font-size: medium;
      }
        
        .style18
      {
          font-size: small;
      }
      .style19
      {
          font-family: Verdana;
          font-weight: bold;
          text-decoration: underline;
          font-size: smaller;
      }
      .style22
      {
          font-weight: bold;
          font-size: medium;
      }
        
        .style23
      {
          font-size: x-small;
      }
        
        .style24
      {
          font-family: Verdana;
          font-weight: bold;
      }
        
        </style>

        <script src="https://cdn.sstatic.net/Js/third-party/citation-helper.js?v=2591ce444a3f"></script>


            <script type="text/x-mathjax-config">
                MathJax.Hub.Config({"HTML-CSS": { preferredFont: "TeX", availableFonts: ["STIX","TeX"], linebreaks: { automatic:true }, EqnChunk: (MathJax.Hub.Browser.isMobile ? 10 : 50) },
                    tex2jax: { inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ], displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], processEscapes: true, ignoreClass: "tex2jax_ignore|dno" },
                    TeX: {
                        extensions: ["begingroup.js"],
                        noUndefined: { attributes: { mathcolor: "red", mathbackground: "#FFEEEE", mathsize: "90%" } }, 
                        Macros: { href: "{}" } 
                    },
                    messageStyle: "none",
                    styles: { ".MathJax_Display, .MathJax_Preview, .MathJax_Preview > *": { "background": "inherit" } },
                    SEEditor: "mathjaxEditing"
            });
            </script>
            <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS_HTML-full"></script>

</head>
<body style="margin-left: 10px">
  <p class="style1"><b><span class="style3"><span class="style5">
      <br />
      Statistics</span></span><span class="style5"><span class="style7">, Cybersecurity [Year 2024 - 25]</span></span></b><br />
  <span class="style4"><em><b>
      <br />
      Topics on Statistics with intensive computer applications   

             <br />
      <br />
      $ \int_0^t d S_u = \int_0^t \mu(S_u, u) du + \int_0^t\sigma(S_u, u) dW_u $</b></em></span></p>
  
    <p><em> 
  Supporto al corso e alla didattica telematica, by T. Gastaldi&nbsp;&nbsp; #Sapienzanonsiferma&nbsp; #Sapienzadoesnotstop</em><br />
  <br />
  (Instructor: <a href="mailto:tommaso.gastaldi@gmail.com">tommaso.gastaldi@gmail.com</a>,<br />
  <a href="https://www.datatime.eu/public/cybersecurity/">https://www.datatime.eu/public/cybersecurity/</a>)</p>



<p><br>
<span class=SpellE><b>Whatsapp</b></span><b> group for the students of this
course</b><br>
Invitation to join the <span class=SpellE>Whatsapp</span> group for this
course: <a href="https://chat.whatsapp.com/Kk3wRGmmxWH9RNUo01zFdX">https://chat.whatsapp.com/Kk3wRGmmxWH9RNUo01zFdX</a><br>
<br>
(When first joining, send a message with your name and id (&quot;<span class=SpellE>matricola</span>&quot;))</p>

<br>____________________________________________________________________________________
<br>
<br>
<br>
<br>
<br><b>General notes for all homeworks</b>
<br>
<br>-Implement exercises in your choice between C#, vb or Javascript. For Js, always use latest 
ECMAScript (use classes, let, const, no var, etc...) and <b>strict mode</b> (in case, webstorm or rider can also be of 
great help to stay up to date with latest language updates and to check syntax.) Put the javascript programs directly online as webpage.
<br>
<br>-All important code must be shown and possibly discussed (as to the the crucial parts only) in the homework web page so that one can understand the main points.
<br>(Full version can be stored on github or as zip file containing the "solution", if you like, but that is not required.) 
<br>
<br>-Never use any third part library external to the leanguage or higher level languages (e.g., sas, r, python, matlab, minitab, etc.) because our purpose is to actually implement from scratch the very basics to deeply understand our topics. (Using other people's "black boxes" would defy our learning purpose.)
<br>
<br>-Always exercise your capacity of abstraction. Never write algorithms that works only on specific cases or data, but, on the contrary, try to be as general as possible in any of your creations and logic. Use smart personal implementations to show your intelligence and insight! Originality and deep thinking are the most appreciated values in this course.
<br>
<br>-Always acknowledge your sources and use quotes when you just copy paste text from other sources (note that what you copy may be wrong!).
<br>
<br>
<br>
<br>
<b>Homework 1</b>
<br>
<br><b>Theory (intro)</b>
<br>- Basic notions in Statistics: Population, Statistical Units, Distribution, Frequency (relative, absolute, percentage);
<br>- Notion of arithmetic average. Derivation. Computational problems with floating point rapresentation (errors, catastrophical cancellation) and numerical solution (Knuth) ;
<br>
<br><b>Applications / practice</b>
<br>
<br> 
We have n servers with m attackers. The hacker has probability p to penetrate each server. Make a graphical representation (line flat if hacker doesn’t penetrate and a jump to 1 if he penetrates), try different n,m,p.
At time n we want to complete distribution how many reached each level. (Draw the distribution histogram vertically 
at the end of the chart, so that each rectangle representing the attackers' frequency is placed on the corresponding number of penetrations (or "successes") they achieved).

<br>
<br>Some resources:
<br>

<br><a href="https://en.wikipedia.org/wiki/Variable_and_attribute_(research)" target="_blank">https://en.wikipedia.org/wiki/Variable_and_attribute_(research)</a>
<br><a href="https://www.investopedia.com/terms/s/statistics.asp" target="_blank">https://www.investopedia.com/terms/s/statistics.asp</a>
<br><a href="https://www.scribbr.com/methodology/sampling-methods/#:~:text=Probability%20sampling%20methods%20include%20simple,a%20chance%20of%20being%20included." target="_blank">https://www.scribbr.com/methodology/sampling-methods/#:~:text=Probability%20sampling%20methods%20include%20simple,a%20chance%20of%20being%20included.</a>
<br><a href="https://en.wikipedia.org/wiki/Design_of_experiments" target="_blank">https://en.wikipedia.org/wiki/Design_of_experiments</a>
<br><a href="https://www.surveymonkey.com/mp/open-ended-questions-get-more-context-to-enrich-your-data/#:~:text=open%2Dended%20questions%3F-,So%20what%20are%20open%2Dended%20questions%3F,or%20other%20closed%2Dended%20format." target="_blank">https://www.surveymonkey.com/mp/open-ended-questions-get-more-context-to-enrich-your-data/#:~:text=open%2Dended%20questions%3F-,So%20what%20are%20open%2Dended%20questions%3F,or%20other%20closed%2Dended%20format.</a>
<br><a href="https://en.wikipedia.org/wiki/Level_of_measurement" target="_blank">https://en.wikipedia.org/wiki/Level_of_measurement</a>

<br>
<br><a href="https://www.youtube.com/watch?v=uHRqkGXX55I&ab_channel=SimpleLearningPro" target="_blank">https://www.youtube.com/watch?v=uHRqkGXX55I&ab_channel=SimpleLearningPro</a>
<br><a href="https://www.youtube.com/watch?v=EZrP_av3cmA&ab_channel=SimpleLearningPro" target="_blank">https://www.youtube.com/watch?v=EZrP_av3cmA&ab_channel=SimpleLearningPro</a>
<br><a href="https://www.youtube.com/watch?v=pTuj57uXWlk&ab_channel=SimpleLearningPro" target="_blank">https://www.youtube.com/watch?v=pTuj57uXWlk&ab_channel=SimpleLearningPro</a>
<br><a href="https://www.youtube.com/watch?v=10ikXret7Lk&ab_channel=SimpleLearningPro" target="_blank">https://www.youtube.com/watch?v=10ikXret7Lk&ab_channel=SimpleLearningPro</a>
<br>
<br>
<br>

<b>Homework 2</b>
<br>
<br><b>Theory</b>
<br>
Find the simplest and most elegant way to show the Welford recursion.
<br>
<br>
<br><b>Application / practice</b>
<br>
<br>
Refine you Euler–Maruyama simulator to approximate numerical solutions of stochastic differential equations (SDE), by adding the following variants to the existing framework:
<br>
<br>A. Jumps -1 +1 with prob. p [random walk]
<br>B. Absolute and relative frequency trajectories
<br>
<br>C. Final distribution and intermediate distributions (at one internal time/step selectable from the gui), 
with mean and variance (make it all parametric so that one unique interface will handle it all).
<br>
<br>Research
<br>Make your personal notes about the behavior of mean and variance wrt to time. For instance:
<br>What did you observe in all the 4 different cases (relative/abs freq & Bernoulli/random walk)?
<br>What are the main differences between the distribution of the distribution of absolute number of successes 
and that of the relative frequencies.
<br>
<br>

<br>
<br>Some resources:
<br>

<br><a href="https://en.wikipedia.org/wiki/Euler%E2%80%93Maruyama_method" target="_blank">https://en.wikipedia.org/wiki/Euler%E2%80%93Maruyama_method</a>
<br><a href="https://en.wikipedia.org/wiki/Random_walk" target="_blank">
https://en.wikipedia.org/wiki/Random_walk</a>
<br><a href="https://en.wikipedia.org/wiki/Multinomial_distribution" target="_blank">
https://en.wikipedia.org/wiki/Multinomial_distribution</a>

<br>
<br>
<br>
<br>

<b>Homework 3</b>
<br>
<br><b>Theory/Research</b>
<br>
<br>
Illustrate formally, in the simplest possible way, why the Median is the minimum c f the sum of |x(i) - c|  (sum of absolute deviations).

<br>
<br>
Find all possible different conceptual different ways to define a "location" statistics (sometime also called "center" or "central tendency") or synthesis of a distributions. Showing how the generalization of these ideas can potentially lead to infinite other definitions.
<br>
<br>
<br><b>Application / practice</b>
<br>
<br>
Refine your SDE simulator to simulate a continuous time process where we can have an attack (indicated with a jump of +1) at any 
time with a constant rate of attack.
To create the approximation of time continuity subdivide your reference temporal window into numerous intervals
of vanishing size dt = 1/n and to each infinitesimal interval assign a probability of a +1 "jump" (attack success) equal 
to Lambda * dt, where Lambda is a simulation parameter, having the meaning of expected total number of attacks in the reference 
period.

<br>
<br>Some resources:
<br>

<br><a href="https://en.wikipedia.org/wiki/Stochastic_simulation" target="_blank">
https://en.wikipedia.org/wiki/Stochastic_simulation
</a>
<br><a href="https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php" target="_blank">
https://www.probabilitycourse.com/chapter11/11_1_2_basic_concepts_of_the_poisson_process.php</a>
<br><a href="https://www.probabilitycourse.com/chapter7/7_1_1_law_of_large_numbers.php" target="_blank">
https://www.probabilitycourse.com/chapter7/7_1_1_law_of_large_numbers.php</a>

<br>
<br>
<br><br>
<br>

<b>Homework 4</b>
<br>
<br><b>Theory/Research</b>
<br>
<br>
Illustrate the concept of statistical independence, showing also the analogies with the formal definitions in probability theory.
<br>
<br><b>Application / practice</b>
<br>
<br>
<br>Refine your stochastic SDE simulator to generate a continuous time, process to represent the scaling limit of the random Walk. 
<br>To create the approximation of time continuity subdivide your reference temporal window into vanishing intervals
<br>dt and on each (theoretically infinitesimal) interval assign a probability p or p to make a jump of a + or - sqrt(dt).
<br>Note the significance of the simulation (Donsker invariance principle/ theorem or the functional central limit theorem)
<br>in relation to the Wiener process.

<br>
<br>Some resources:
<br>

<br><a href="https://en.wikipedia.org/wiki/Donsker%27s_theorem" target="_blank">
https://en.wikipedia.org/wiki/Donsker%27s_theorem
</a>
<br><a href="https://www.youtube.com/watch?v=sJPlOMrcJXo&ab_channel=ResearchMethodsandStatistics%28FMG%2CUvA%29" target="_blank">
https://www.youtube.com/watch?v=sJPlOMrcJXo&ab_channel=ResearchMethodsandStatistics%28FMG%2CUvA%29</a>


<br>
<br>
<br>
<br>
<br>

<b>Homework 5</b>
<br>
<br>- Prove in the simplest possible way the C-S (Cauchy-Schwarz) inequality
<br>(r coefficient normalizing denominator)
<br>
<br>- Reflect on the concepts of independence and uncorrelation, pointing
<br>out conceptual differences and possible measures.
<br>
<br>- E-M Simulator Enhancement:
<br>Enhance your existing Euler-Maruyama (E-M) simulator by developing a unified simulation framework. Create a general central class that can possibly manage various types of stochastic differential <br>equations (SDEs).
<br>
<br>Optional: Regression Coefficients:
<br>Derive the coefficients (b) and (a) of two regression lines using the least squares method, and show the relationships with R^2.]
<br>
<br>
<br>
<br>
<br>

<b>Homework 6</b>
<br>
<br><b>Theory/Research</b>
<br>
Research: Recall the fundamental theorem of calculus and demonstrate its relationship with density 
functions and cumulative distribution functions (CDFs).
 
<br>
<br><b>Application / practice</b>
<br>
Exercise: Generate realizations from a discrete univariate probability distribution with arbitrary probabilities.
Graphically show the convergence of the empirical distribution to the theoretical distribution as the sample size increases.
Compute also, during the generation, the mean and variance using recursive methods (e.g., Knuth's/Welford's algorithms) 
and compare these results with the theoretical mean and variance, discussing the relationship.
<br>
<br>
<br>
<br>

<b>Homework 7</b>
<br>
<br><b>Theory/Research</b>
<br>
<br><b>Application / practice</b>
<br>
<br>Using the setup of previous homework, from a discrete distribution generate m (e.g. m=1000 ...) samples 
<br>of size n (e.g., n = 20, 30, 100, ...). Compute the distribution of the sampling average. 
<br>Determine the average and variance of the distribution of the averages of the samples, and represent the distribution, 
<br>discussing the observed relationship with the mean and variance of the parent (theoretical) distribution.
<br>
<br>
<br>
<b>Optional</b>
<br>
<br>Given the random variable Y = g^U mod n (meaning the remainder of the division by n) 
<br>where U is a Uniform in [1, max_U] (max_U is a user param)
<br>
<br>A) Generate the distributions of Y for n = 19 and g = 2, 3, 10, 17 
<br>B) Generate the distributions of Y for n = 15 and g = 3, 6, 9, 12
<br>
<br>Observe the shape of the distributions and compute the entropy or other diversity indexes. Give your opinion on 
<br>the implications of any observed differences in terms of cryptographic properties (uniformity, predictability) 
<br>and potential applications. Why case A may be better suited for cryptographic applications ? Why case B 
<br>(predictability, lower entropy ?) may illustrate possible vulnerabilities, if any ?
<br>What is the reason why we choose the set { 2, 3, 10, 17 } in case A ? Spot possible errors in the exercise

<br>
<br> 
<br>
 <b>Homework 8</b>
<br>
<br><b>Theory/Research</b>
<br>
<br>Recall the notion of Shannon Entropy amd other diversity measures of distributions
<br>
<br>Recall the notion of primitive root (a primitive root modulo p a prime number is a number g such that for every 
<br>integer  a  that is coprime to  p , there exists an integer  k  such that  g^k \mod p = a )
<br>
<br><b>Application / practice</b>
<br>
<br>
<br><b> Part 1</b>
<br>Find and compile a sufficiently large piece of text by selecting several web pages and create a letter frequency distribution.
<br>Choose a random shift value (e.g., 1-25, with wrap-around) and apply the Caesar cipher to encrypt the original text: 
<br>E = L + shift for each letter L of the message.
<br>Use <b>frequency analysis</b> or find any efficient and effective strategy to find the shift and decrypt the message.
<br>
<br> <b>Part 2 Optional (Modular exponentiation)</b>
<br>Convert each letter of the original text to a numeric representation (A = 0, B = 1, ..., Z = 25). 

<br>Choose Parameters: Choose an exponent e and a modulus P. Ensure that e and P are coprime 
<br>(for example, you might choose ( e = 3 ) and ( P = 37)).
<br>Calculate Encoded Values: Calculate the encoded values using the formula: E = L^e mod P
<br>for each letter L of the message, where Lis the numeric representation of the letter. Try also encoding  the
<br>entire message and not the single letter and discuss the difference.

<br>
<br>See if you can find strategies and effective ways to get back the values of e and P. 
(In practice, certain values of e, like 3 or 65537 are commonly used. You may start with these values for e)
<br>
<br>Visualize the distributions and calculate the Shannon entropy of the transformed distributions. 
<br>Summarize the findings from both parts of the exercise. Discuss how statistical analysis enhances understanding 
<br>of cryptographic algorithms and the importance of these skills in cybersecurity.
<br>

<br>
<br>

Hints and resources: 
<pre>
    Function CaesarShift(input As String, shift As Integer) As String
        Dim result As New System.Text.StringBuilder()

        For Each ch As Char In input
            If Char.IsUpper(ch) Then
                ' Handle uppercase letters
                Dim offset As Integer = Asc("A")
                Dim shiftedChar As Char = Chr((Asc(ch) - offset + shift) Mod 26 + offset)
                result.Append(shiftedChar)
            ElseIf Char.IsLower(ch) Then
                ' Handle lowercase letters
                Dim offset As Integer = Asc("a")
                Dim shiftedChar As Char = Chr((Asc(ch) - offset + shift) Mod 26 + offset)
                result.Append(shiftedChar)
            ElseIf Char.IsDigit(ch) Then
                ' Handle digits (0-9)
                Dim offset As Integer = Asc("0")
                Dim shiftedChar As Char = Chr((Asc(ch) - offset + shift) Mod 10 + offset)
                result.Append(shiftedChar)
            Else
                ' Non-letter characters are not shifted
                result.Append(ch)
            End If
        Next

        Return result.ToString()
    End Function
	
	//-----------------------
	
	function caesarShift(input, shift) {
    let result = '';

    for (let ch of input) {
        if (ch >= 'A' && ch <= 'Z') {
            // Handle uppercase letters
            let offset = 'A'.charCodeAt(0);
            let shiftedChar = String.fromCharCode(((ch.charCodeAt(0) - offset + shift) % 26 + offset));
            result += shiftedChar;
        } else if (ch >= 'a' && ch <= 'z') {
            // Handle lowercase letters
            let offset = 'a'.charCodeAt(0);
            let shiftedChar = String.fromCharCode(((ch.charCodeAt(0) - offset + shift) % 26 + offset));
            result += shiftedChar;
        } else if (ch >= '0' && ch <= '9') {
            // Handle digits (0-9)
            let offset = '0'.charCodeAt(0);
            let shiftedChar = String.fromCharCode(((ch.charCodeAt(0) - offset + shift) % 10 + offset));
            result += shiftedChar;
        } else {
            // Non-letter characters are not shifted
            result += ch;
        }
    }

    return result;
}

// Example usage
console.log(caesarShift("Hello, World! 123", 3)); // Output: "Khoor, Zruog! 456"


### English (EN) Letter Frequency Distribution
1. **K. M. O’Hara's Studies**: This study provides a comprehensive analysis of letter frequencies in English text. Here's a rough frequency distribution based on various sources:

    - E: 12.70%
    - T: 9.06%
    - A: 8.17%
    - O: 7.51%
    - I: 7.00%
    - N: 6.75%
    - S: 6.33%
    - H: 6.09%
    - R: 5.99%
    - D: 4.25%
    - L: 4.03%
    - C: 2.78%
    - U: 2.76%
    - M: 2.41%
    - W: 2.36%
    - F: 2.23%
    - Y: 1.97%
    - P: 1.93%
    - B: 1.49%
    - V: 0.98%
    - K: 0.77%
    - J: 0.15%
    - X: 0.15%
    - Q: 0.10%
    - Z: 0.07%
   
2. **Wikipedia**: The page on [Letter Frequencies](https://en.wikipedia.org/wiki/Letter_frequency) provides a good overview and includes references to original studies.
   
3. **Cryptography and Information Security**: There are many cryptography textbooks that discuss letter frequency, including authors like Bruce Schneier and William Stallings.

### Italian (ITA) Letter Frequency Distribution 
1. **Italian Language Studies**: Here’s a common frequency distribution for the Italian language:

    - E: 11.79%
    - A: 10.49%
    - I: 9.96%
    - O: 8.76%
    - T: 6.87%
    - N: 6.73%
    - R: 6.52%
    - S: 5.38%
    - L: 5.26%
    - U: 3.33%
    - D: 3.41%
    - C: 3.29%
    - M: 2.51%
    - P: 2.49%
    - H: 0.77%
    - B: 0.81%
    - F: 0.84%
    - X: 0.10%
    - J: 0.12%
    - K: 0.03%
    - Q: 0.52%
    - Z: 0.39%
    - W: 0.00%

</pre>


<br>
<br>Some resources:
<br>

<br><a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)" target="_blank">
https://en.wikipedia.org/wiki/Entropy_(information_theory) </a>

<br><a href="https://en.wikipedia.org/wiki/Letter_frequency" target="_blank">
https://en.wikipedia.org/wiki/Letter_frequency </a>

<br><a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence" target="_blank">
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence </a>

<br><a href="https://en.wikipedia.org/wiki/Majorization" target="_blank">
https://en.wikipedia.org/wiki/Majorization </a>

<br><a href="https://en.wikipedia.org/wiki/Modular_exponentiation" target="_blank">
https://en.wikipedia.org/wiki/Modular_exponentiation </a>


<br>
<br>
<br>
<br>


<b>Homework 9</b>
<br>
<br><b>Theory/Research</b>
<br>
<br> Mention the main properties of the sampling mean and variance. 
<br> Illustrate the law of large numbers and some possible applications, especially related to cybersecurity concepts.
<br>
<br><b>Application / practice</b>
<br>Following the same scheme of  HMWK 7 compute the distribution of the sampling variance ("corrected" or not). 
<br>Determine the distribution of the variances of the samples, and its mean and variance.
<br>discussing the observed relationship with the mean and variance of the parent (theoretical) distribution.
<br>
<br> <b>Optional</b>
<br>
<br><b>Theory/Research</b>
<br>Research: Recall the fundamental ideas of the main encryption methods and their statistical properties.
<br>
<br><b>AES-Inspired Encryption, Didactical "Toy" Version exercise</b>
<br>Objective: Apply statistics to learn about encryption and decryption using a simple substitution and permutation cipher. Gain insight into the fundamentals of encryption, key management, and frequency  analysis, similar to concepts used in AES and RSA, particularly focusing on how these methods affect frequency distribution and entropy.
<br>
<br>Create a Substitution Cipher
<br>Generate a Substitution Key: Create a random mapping of the letters A-Z. Each letter should map to a unique letter.  

<br>Example:  
 <pre>
 
A -> Q, B -> Z, C -> X, D -> W, E -> V, F -> U, G -> T, 
H -> R, I -> S, J -> P, K -> O, L -> N, M -> M, 
N -> L, O -> K, P -> J, Q -> I, R -> H, S -> G, 
T -> F, U -> E, V -> D, W -> C, X -> B, Y -> A, Z -> Y
</pre>

<br>Choose a message: Pick a short message to encrypt. Example: "HELLO WORLD".
<br>Encrypt the Text: Use your substitution key to transform each letter of your message. Write down the encrypted message.
<br>
<br><b>Statistical Analysis:</b>  
<br>
<br>Frequency Distribution: 
<br>Analyze the final encrypted message. Compare the frequency of letters in your original message and in the encrypted message. Discuss how the substitution cipher affects the distribution of letters.
<br>Entropy: Calculate the entropy of both the original and the encrypted messages. Discuss how the substitution affects the amount of uncertainty or randomness in the message.
<br>
<br>Permutation Step:
<br>Reverse the order of the encrypted letters to create the final encrypted output.
<br>Example: If your encrypted message is "RVNNM KOLHY", reverse it to "YHLKOM MNNVR".
<br>Discuss how reversing the order of letters affects the frequency distribution and entropy. Does it reveal or obscure any patterns?
<br>
<br>Encryption/Decryption Challenge.
<br>
Exchange encrypted messages with a classmate. Attempt to decode each other's messages as a challenge. Start with an encrypted message such as "YHLKOM MNNVR".
<br>
Guess the original message using frequency analysis or pattern recognition. If you know the substitution key, decode it by reversing both the substitution and the permutation.
<br>
<br>
<b>Statistical Discussion:</b>  
<br>
Frequency Distribution Changes: How did the frequency distribution of letters change after applying the permutation step? Discuss the significance of this change in terms of statistical analysis and cryptography. Entropy Considerations: Discuss how the overall entropy of the original and final messages compares. What does this indicate about the security and unpredictability of the encrypted message compared to the original?
<br>
Contrast with RSA concepts. Discuss how RSA tends to maintain the structure of frequency distribution while complicating decryption through its key management.
<br>
Final thoughts on entropy and security: Reflect on the importance of entropy in cryptography. Consider how higher entropy in an encrypted message can enhance security by making it harder for attackers to predict or analyze the message content.
<br>
[ 
Notes:
<br>Comparison with AES: In your discussions, compare how AES significantly alters the frequency distribution and entropy of plaintext through complex transformations, including multiple layers of substitution and permutation, as well as an integral <b>XOR</b> operation with a key. Although we are skipping the XOR steps for simplicity in this assignment, they are crucial in the AES process, as they introduce an additional layer of complexity that enhances security and makes reverse engineering through frequency analysis difficult.]  

<br>
<br>
<br>
<br>


<b>Homework 10</b>
<br>
<br><b>Theory/Research</b>
<br> General concept of sampling mean and variance and main features of their distributions
<br> General idea Lebesgue–Stieltjes integration and applications to Probability theory and to Measure theory
<br>
<br><b>Application / practice</b>
<br> Try compute numerically a Lebesgue integral and compare the same with the Riemann integral (you might
compute mean or variance of a distribution).

<br><a href="https://en.wikipedia.org/wiki/Lebesgue%E2%80%93Stieltjes_integration" target="_blank">
https://en.wikipedia.org/wiki/Lebesgue%E2%80%93Stieltjes_integration</a>
<br><a href="https://en.wikipedia.org/wiki/Measure_(mathematics)" target="_blank">
https://en.wikipedia.org/wiki/Measure_(mathematics)</a>
<br><a href="https://www.stat.berkeley.edu/~wfithian/courses/stat210a/measure-theory-basics.html" target="_blank">
https://www.stat.berkeley.edu/~wfithian/courses/stat210a/measure-theory-basics.html</a>
<br><a href="https://www.youtube.com/watch?v=TG67nsccqeQ" target="_blank">
https://www.youtube.com/watch?v=TG67nsccqeQ</a>

<br>
<br> <b>Optional</b>

<br><b>SSL/TLS Certificate Transparency Stat Analysis</b>
<br>
<br>Analyze publicly available SSL/TLS certificate data to identify potential security insights and patterns.
<br>
<br>Collect a sample of certificates and do some statestical processing on their features:
<br>
<br>Extract key statistical information:
<br>
<br>Certificate issuer distribution
<br>Certificate validity periods
<br>Geographic distribution of certificates
<br>Types of encryption used
<br>...
<br>
<br><b>Statistical Analysis</b>:
<br>
<br>Example, calculate:
<br>
<br>Mean and median certificate validity duration
<br>Most common certificate authorities
<br>Distribution of key lengths
<br>Proportion of short vs. long-lived certificates
<br>
<br>Potential Insights:
<br>Identify potential security risks
<br>Detect unusual certificate patterns
<br>Compare certificate practices across different domains/industries
<br>
<br>... or anything you find interesting to study
<br>
<br>Example:
<br>Pie charts of certificate issuers
<br>Bar graphs of key lengths
<br>Timeline of certificate expirations
<br>

<pre>

Core Functionality in VS:
Imports System.Security.Cryptography.X509Certificates

Primary Certificate Management Classes
- X509Certificate
- X509Certificate2
- X509Store
- X509Chain

RSA Roles:
1. Key Generation
2. Public/Private Key Pair
3. Digital Signature
4. Encryption Mechanism
5. Certificate Signing

RSA Versions/Key Lengths:
1. RSA-1024 (Deprecated)
2. RSA-2048 (Current Standard)
3. RSA-4096 (High Security)
Current SSL/TLS Versions:
 
Active Versions:
1. TLS 1.2 (Widely Used)
2. TLS 1.3 (Latest Recommended)

Deprecated:
- SSL 3.0 (Obsolete)
- TLS 1.0 (Insecure)
- TLS 1.1 (Deprecated)
TLS 1.3 Key Improvements:

tools:
SSL Labs (https://www.ssllabs.com/ssltest/)


SSL (Secure Sockets Layer)
   ↓
TLS (Transport Layer Security)
   - SSL 3.0 → TLS 1.0
   - Developed by IETF
   - Successor to SSL
   
   Certificate Components:
   
- Public Key
- Private Key
- Digital Signature
- Encryption Algorithm
   
RSA in TLS Handshake:

Phases:
1. Key Exchange
2. Initial Authentication
3. Symmetric Key Establishment
Detailed Handshake Process:

Client Hello → Server Hello
↓ RSA Used for:
- Initial Key Exchange
- Certificate Authentication
- Asymmetric Encryption of Shared Secret

After Handshake:
- Switch to Symmetric Encryption (Faster)
- Uses Session Key

Handshake Mechanism:

1. Asymmetric Encryption (RSA)
   - Slow but Secure
   - Used for Initial Key Exchange

2. Symmetric Encryption (AES)
   - Fast
   - Used for Actual Data Transfer
   
Technical Workflow:

Client Steps:
1. Generate Random Premaster Secret
2. Encrypt with Server's Public RSA Key
3. Send Encrypted Premaster Secret

Server Steps:
1. Decrypt Premaster Secret using Private RSA Key
2. Derive Session Keys
3. Establish Symmetric Encryption

</pre>


First Hints and some snippets:

<pre>
Imports System.Net.Http
Imports System.Text.Json
Imports System.Linq

Public Class CertificateAnalyzer
    Private Const API_ENDPOINT As String = "https://crt.sh/?q={0}&output=json"

    ' Main method to run the analysis
    Public Shared Sub Main()
        ' List of domains to analyze
        Dim domains As String() = {"google.com", "microsoft.com", "github.com"}

        ' Analyze certificates for each domain
        For Each domain In domains
            AnalyzeDomainCertificates(domain)
        Next
    End Sub

    ' Method to fetch and analyze certificates for a specific domain
    Public Shared Async Sub AnalyzeDomainCertificates(domain As String)
        Try
            ' Fetch certificate data
            Dim certificates As List(Of Certificate) = Await FetchCertificatesAsync(domain)

            ' Perform statistical analysis
            Dim analysis As CertificateAnalysis = PerformAnalysis(certificates)

            ' Display results
            DisplayResults(domain, analysis)

        Catch ex As Exception
            Console.WriteLine($"Error analyzing {domain}: {ex.Message}")
        End Try
    End Sub

    ' Fetch certificates from crt.sh API
    Private Shared Async Function FetchCertificatesAsync(domain As String) As Task(Of List(Of Certificate))
        Using client As New HttpClient()
            Dim url As String = String.Format(API_ENDPOINT, domain)
            Dim response As String = Await client.GetStringAsync(url)

            ' Parse JSON response
            Dim options As New JsonSerializerOptions()
            options.PropertyNameCaseInsensitive = True

            Dim certData As List(Of Certificate) = 
                JsonSerializer.Deserialize(Of List(Of Certificate))(response, options)

            Return certData
        End Using
    End Function

    ' Perform statistical analysis on certificates
    Private Shared Function PerformAnalysis(certificates As List(Of Certificate)) As CertificateAnalysis
        Dim analysis As New CertificateAnalysis()

        ' Calculate certificate issuer distribution
        analysis.IssuerDistribution = 
            certificates.GroupBy(Function(c) c.Issuer_name)
            .Select(Function(g) New With {
                .Issuer = g.Key,
                .Count = g.Count()
            })
            .OrderByDescending(Function(x) x.Count)
            .ToList()

        ' Calculate average validity period
        analysis.AverageValidityDays = 
            certificates.Average(Function(c) 
                (c.Not_after - c.Not_before).TotalDays)

        ' Count unique key lengths
        analysis.KeyLengthDistribution = 
            certificates.GroupBy(Function(c) c.Pubkey_size)
            .Select(Function(g) New With {
                .KeyLength = g.Key,
                .Count = g.Count()
            })
            .OrderBy(Function(x) x.KeyLength)
            .ToList()

        Return analysis
    End Function

    ' Display analysis results
    Private Shared Sub DisplayResults(domain As String, analysis As CertificateAnalysis)
        Console.WriteLine($"Certificate Analysis for {domain}")
        Console.WriteLine("----------------------------")

        ' Display issuer distribution
        Console.WriteLine("Certificate Issuer Distribution:")
        For Each issuer In analysis.IssuerDistribution
            Console.WriteLine($"{issuer.Issuer}: {issuer.Count} certificates")
        Next

        ' Display average validity
        Console.WriteLine($"Average Certificate Validity: {analysis.AverageValidityDays:F2} days")

        ' Display key length distribution
        Console.WriteLine("Key Length Distribution:")
        For Each keyLength In analysis.KeyLengthDistribution
            Console.WriteLine($"{keyLength.KeyLength} bits: {keyLength.Count} certificates")
        Next
    End Sub

    ' Certificate data model
    Public Class Certificate
        Public Property Issuer_name As String
        Public Property Not_before As Date
        Public Property Not_after As Date
        Public Property Pubkey_size As Integer
    End Class

    ' Analysis results container
    Public Class CertificateAnalysis
        Public Property IssuerDistribution As List(Of Object)
        Public Property AverageValidityDays As Double
        Public Property KeyLengthDistribution As List(Of Object)
    End Class
End Class

</pre>

<!--

<br>Homework 9</b>
<br>
<br><br>Theory/Research</b>
<br>
Research: Recall the fundamental ideas of the main encryption methods and their statistical properties


(Homework Assignment: AES-Inspired Encryption, didactical "toy" version)

Apply Statistics to learn about encryption and decryption using a simple substitution and permutation cipher. 
Gain insight into the fundamentals of encryption, key management, and frequency analysis, similar to concepts used in AES and RSA, 
particularly focusing on how these methods affect frequency distribution and entropy.

Instructions:

-- Part 1A: Create a Substitution Cipher

Generate a Substitution Key. For instance, create a random mapping of the letters A-Z. Each letter should map to a unique letter.

Example:  

A -> Q, B -> Z, C -> X, D -> W, E -> V, F -> U, G -> T, 
H -> R, I -> S, J -> P, K -> O, L -> N, M -> M, 
N -> L, O -> K, P -> J, Q -> I, R -> H, S -> G, 
T -> F, U -> E, V -> D, W -> C, X -> B, Y -> A, Z -> Y

Choose a Message:

Pick a short message to encrypt.
Example: "HELLO WORLD".

Encrypt the Text:
Use your substitution key to transform each letter of your message. Write down the encrypted message.

Statistical Analysis After Part 1:

Frequency Distribution: Analyze the final encrypted message. Compare the frequency of letters in your original message and the encrypted message. 
Discuss how the substitution cipher affects the distribution of letters.
Entropy: Calculate the entropy of both the original and the encrypted messages. Discuss how the substitution affects the amount of uncertainty or randomness in the message.


-- Part 1B: Permutation Step

Reverse the order of the encrypted letters to create the final encrypted output. Example: If your encrypted message is "RVNNM KOLHY", reverse it to "YHLKOM MNNVR".

Statistical Analysis After Part 2:
Impact on Distribution and Entropy: Discuss how reversing the order of letters affects the frequency distribution and entropy. Does it reveal or obscure any patterns?


-- Part 2: Decrypting Exercise. Encryption/Decryption Challenge.

Exchange encrypted messages with a classmate. Attempt to decode each other's messages as a challenge using an encrypted message such as "YHLKOM MNNVR".

Guess the original message using frequency analysis or pattern recognition. If you know the substitution key, decode it by reversing both the substitution and the permutation.

Statistical Discussion:
Frequency Distribution Changes: How did the frequency distribution of letters change after applying the permutation step? Discuss the significance of this 
change in terms of statistical analysis and cryptography. 
Entropy Considerations: Discuss how the overall entropy of the original and final messages compares. What does this indicate about the security and unpredictability 
of the encrypted message compared to the original?

Conclusion
Comparison with AES and modular exponentiation (used by RSA): In your discussions, compare how AES significantly alters the frequency distribution and entropy of plaintext, 
making reverse engineering through  frequency analysis difficult. In contrast, RSA maintains the structure of frequency distribution but complicates the decryption through its key management.
Final Thoughts on Entropy and Security: Reflect on the importance of entropy in cryptography. Consider how higher entropy in an encrypted message can enhance security by making it harder 
for attackers to predict or analyze the message content.

Note: In cryptographic contexts, a "key" is generally defined as a piece of information used by an algorithm to perform encryption and decryption. It is not necessarily an array of 
parameters; instead, a key can take various forms, such as a simple numeric value (like the shift in a Caesar cipher), a collection of values (as in certain forms of exponentiation), 
or a more complex structure (like a substitution table in substitution ciphers).


-->


 


 

<!-- 

### Modified Homework for AES-Inspired Encryption

---

### 

**Theory/Research**  
Research: Recall the fundamental ideas of the main encryption methods and their statistical properties.

**Homework Assignment: AES-Inspired Encryption, Didactical "Toy" Version**

Apply statistics to learn about encryption and decryption using a simple substitution and permutation cipher, along with an XOR operation. Gain insight into the fundamentals of encryption, key management, and frequency analysis in a way that resembles the structure and principles of AES.

---

### Instructions:

#### Overview of AES-inspired Steps:

1. **Key Expansion** (substitution cipher key)
2. **Initial AddRoundKey** (XOR operation)
3. **SubBytes** (substitution)
4. **ShiftRows** (permutation)
5. **MixColumns** (optional complexity)
6. **Final AddRoundKey** (XOR operation - decrypt)
7. **Decrypting Exercise**

---

### Detailed Steps:

#### Part 1A: Key Expansion

1. **Generate a Substitution Key**: 
   Create a random mapping of the letters A-Z. Each letter should map to a unique letter.

   **Example**:  
   ```
   A -> Q, B -> Z, C -> X, D -> W, E -> V, F -> U, G -> T, 
   H -> R, I -> S, J -> P, K -> O, L -> N, M -> M, 
   N -> L, O -> K, P -> J, Q -> I, R -> H, S -> G, 
   T -> F, U -> E, V -> D, W -> C, X -> B, Y -> A, Z -> Y
   ```

---

#### Part 1B: Initial AddRoundKey (XOR Operation)

1. **Choose a Simple Key**: 
   Select a simple key for the XOR operation. Use a short sequence of letters, such as "KEY".

2. **Apply XOR with Key**: 
   Convert each character of your message into numerical values (e.g., using ASCII values) and perform an XOR operation between the key and your original message. 

   For example, with the original message "HELLO WORLD":
   - Message: H E L L O   W O R L D
   - XOR Key: K E Y K E   Y K E Y K
   - Result: Apply the XOR operation for each position.

---

#### Part 2: SubBytes (Substitution)

1. **Substitute the Message**: 
   Use your substitution key to transform each letter of your XOR'ed message. Write down the encrypted message.

---

#### Part 3: ShiftRows (Permutation Step)

1. **Reverse the Order**: 
   Reverse the order of the substituted letters to create the final encrypted output. For instance, if your substituted message is "RVNNM KOLHY," reverse it to "YHLKOM MNNVR."

---

#### Part 4: Final AddRoundKey (XOR Operation for Decryption)

1. **XOR Again for Encrypted Result**: 
   To reflect the reversal of the encryption process, apply another XOR with the same key used earlier on the final message.

#### Statistical Analysis After Each Step:

1. **Frequency Distribution**: Analyze the letter frequencies after the XOR operation, substitution, and permutation. Discuss changes at each stage.
  
2. **Entropy**: Calculate the entropy of messages after each step. Discuss how these transformations affect the uncertainty or randomness of the message at each stage.

---

#### Part 5: Decrypting Exercise

1. **Exchange Encrypted Messages**: 
   Exchange encrypted messages with classmates. Attempt to decode each other's messages using the steps above.

2. **Decrypt**: 
   If you know the substitution key and the original XOR key, decode by reversing the steps: apply the inverse of each operation (inverse of SubBytes and reverse XOR).

### Statistical Discussion:

- **Frequency Distribution Changes**: How did the frequency distribution of letters change after each step? Discuss the significance of these changes in terms of statistical analysis and cryptography.

- **Entropy Considerations**: Discuss how overall entropy varies across the original and the final messages. What does this indicate about the security and unpredictability of the final message?

---

### Conclusion

- **Comparison with AES and Modular Exponentiation (used by RSA)**:  
  Highlight how AES follows a structured approach to modify frequency distribution and entropy, making reverse engineering challenging. Discuss how the simple XOR step reflects the mixture of confusion and diffusion principles utilized in AES.

- **Final Thoughts on Entropy and Security**:  
  Reflect on the importance of entropy in cryptography. Consider how higher entropy can improve security by making it more difficult for attackers to predict message content.

---

### Note: 
In cryptographic contexts, a "key" refers to a piece of information used by an algorithm for encryption and decryption. It can take various forms, such as numeric values, character sequences, or substitution tables.

---

This revised homework structure closely follows the AES methodology while remaining understandable and manageable for students. It introduces them to essential cryptographic principles and builds their awareness of both statistical analysis and security concerns in encryption. If you have any further requests or adjustments, feel free to ask!


===============
Certainly! Let's clarify the XOR operation step by providing a simple, concrete example. I will demonstrate how to apply the XOR operation between the letters in the message and a key for a better understanding.

### Example of XOR Operation

1. **Original Message**: "HELLO WORLD"
2. **XOR Key**: Use a repeating key like "KEY" (when the key is shorter than the message, repeat it).

### Step 1: Align the Key with the Message

To perform the XOR operation between the ASCII values of each character in the message and the key, you first need to align the key with the original message, repeating as necessary.

```
Message:     H  E  L  L  O    W  O  R  L  D
Key:        K  E  Y  K  E    Y  K  E  Y  K
```

### Step 2: Convert Characters to ASCII Values

Next, convert each character to its corresponding ASCII value. Here are some values for reference:

- 'H' = 72
- 'E' = 69
- 'L' = 76
- 'O' = 79
- 'W' = 87
- 'R' = 82
- 'D' = 68
- 'K' = 75
- 'Y' = 89

So, the conversion will look like this:

```
Message ASCII:   72  69  76  76  79    87  79  82  76  68
Key ASCII:       75  69  89  75  69    89  75  69  89  75
```

### Step 3: Apply the XOR Operation 

Now, we perform the XOR operation between the ASCII values of the message and the corresponding key values. The XOR operation is defined as follows:

- **XOR Operation**: For two bits, the XOR is true (1) if the bits are different, and false (0) if the bits are the same.
- In the context of ASCII values, you perform it as bitwise operations.

Here’s how it looks:

```
Resulting ASCII: 
(72 XOR 75)  (69 XOR 69)  (76 XOR 89)  (76 XOR 75)  (79 XOR 69)
(87 XOR 89)  (79 XOR 75)  (82 XOR 69)  (76 XOR 89)  (68 XOR 75)
```

### Step 4: Calculate the Resulting Values

Now let's calculate the XOR values step by step:

- `72 XOR 75 = 5`
- `69 XOR 69 = 0`
- `76 XOR 89 = 13`
- `76 XOR 75 = 1`
- `79 XOR 69 = 10`
- `87 XOR 89 = 4`
- `79 XOR 75 = 34`
- `82 XOR 69 = 19`
- `76 XOR 89 = 13`
- `68 XOR 75 = 7`

So the resulting ASCII values are:

```
Resulting ASCII: 5   0   13   1   10   4  34  19  13   7  
```

### Step 5: Convert Back to Characters

Finally, convert the resulting ASCII values back to characters. This often results in non-printable characters, but for the sake of this example, let's see the conversion:

- ASCII 5   -> Enquiry character (non-printable)
- ASCII 0   -> Null (non-printable)
- ASCII 13  -> Carriage return (non-printable)
- ASCII 1   -> Start of Heading (non-printable)
- ASCII 10  -> Line Feed (non-printable)
- ASCII 4   -> End of Transmission (non-printable)
- ASCII 34  -> "
- ASCII 19  -> Device Control 3 (non-printable)
- ASCII 13  -> Carriage return (non-printable)
- ASCII 7   -> Bell (non-printable)

Many of the resulting characters are non-readable, but that's typical in encryption processes; encrypted outputs are not necessarily human-readable.

### Final Output

After performing the XOR operation with your key, you can write down the final output. However, keep in mind that because the output includes non-printable characters, such representation is not suitable for communication. Instead, in a real encryption scenario, encoded binary data or hex values would be used for transmitting encrypted messages.

 
<br>
<br>
<br>
<br>
<br>
<br>
</body>

</html>