Notes on the Online Cryptography Course from Stanford.
https://crypto.stanford.edu/~dabo/courses/OnlineCrypto/
Table of Contents
 What is cryptography about?
 Crash course in discrete probability
 Stream ciphers 1: The onetime pad and stream ciphers
 Stream ciphers 2: Attacks and common mistakes
 Stream ciphers 3: Realworld examples
 Stream ciphers 4: What is a secure cipher?
 Closing notes
What is cryptography about?
What is cryptography
Cryptography is more than just encryption and decryption.
Cryptography is about (a) secure key establishment and (b) secure communications.
Secure key establishment amounts to Alice and bob agreeing on a shared key in such a way that an attacker has no idea what this key might be.
Secure communications means that:
 Alice and Bob know that they are talking to each other
 Their messages are confidential — no one can snoop in on what they are saying.
 Their messages cannot be tampered with
With the core out of the way, we can do some more interesting stuff.
 Digital signature: How can we make the equivalent of physical signatures in digital world while making sure an attacker cannot just copy and paste our signature elsewhere?
 Anonymous communication: How can Alice interact with a server without the server knowing anything about who interacted with it? (e.g. mix net)
 Anonymous digital cash: How can we bring the anonymity of cash into the digital world while making sure people won’t double spend?
 Secure multiparty computation
 Given some input from different parties, how do you compute a result without revealing anything else about the inputs? This has applications in elections for example.
 There is a theorem: "Anything that can be done with a trusted authority can also be done without"
 Crypto Magic!
 Privately outsourcing computation
 Zeroknowledge proofofknowledge
 There are ways to prove that you have a solution to a problem, without revealing anything about the solution itself.
Cryptography will be approached in the following way:
 Precisely define the threat model: Who is the attacker? What are they trying to do? What are their goals? Define
 Propose a construction
 Prove that if an attacker is attacking our construction under our threat model, their attack can be used to attack an underlying unsolvable/hard problem. Basically, we are outsourcing the proof of security to another field (like mathematics). Nice.
History of cryptography
THE history of cryptography: The Code Breakers by David Kahn
Crash course in discrete probability
Cryptographic constructs are always accompanied by a proof of security.
Discrete probability is used as the language of such proofs.
The "discrete" part simply means that the number of possible things that can happen can be counted, it is not infinite.
This is in contrast to continuous probability which is it’s own beast that I hopefully will not be coming across in this course.
 Event independence
 Events A & B are independent if P[A and B] = P[A] * P[B]
 XOR is important because of an odd? property it has:
 Given a uniform random variable X, if you XOR it with another random variable Y (does not matter what it is), the result is always a uniform random variable
Stream ciphers 1: The onetime pad and stream ciphers
Information theoretic security and the onetime pad
So what is a cipher? A cipher is a set of two functions, encryption and decryption.
Encryption takes a key and some message and produces ciphertext
Decryption takes a key and ciphertext and produces a message
What makes a secure cipher ?
The father of information Claude Shannon has our backs here.
In plain English:
The cipher text should reveal no information about the plain text.
In the language of probability:
For every pair of messages m0 and m1, the probability of encrypting m0 and getting a cipher text c should be the same as the probability of encrypting m1 and getting that same cipher text c.
This makes sense intuitively. You shouldn’t be able to use such things as letter frequency to analyze how likely it is that certain cipher text symbols correspond to plain text.
A cipher that meets this requirement is said to have perfect secrecy. It leaks no information. It tells no tales.
 The One Time Pad (Vernam 1917)
 the key is a random bit string as long as the message
 encryption is XORing the key with the message.
 decryption is XORing the key and the ciphertext
 Not practical because you need a way to securely communicate the key beforehand. But if you have a way to do that, then you could use it to transmit the message too.
 Given our above definition, the onetime pad has perfect secrecy.
 This is not the end of the story though. Perfect secrecy only guarantees that the cipher is not vulnerable to cipher text only attacks.
Claude Shannon also proved (unfortunately) that a perfectly secure cipher must have keys that are at least as long as the messages.
The onetimepad satisfies this, but this certainly throws a wrench in things.
Stream ciphers and pseudo random generators
The stream cipher is an attempt to make a usable onetimepad implementation.
It works like so:
 The key is some reasonable nbit number
 The encryption process takes that key and uses it as a seed for a Pseudorandom generator that produces a value as long as the message
 That value is then used to encrypt the message.
Stream ciphers does not pass our perfect secrecy test. The key is not the same size as the message.
A different metric for a cipher’s security is needed that can help characterize how secure stream ciphers are.
We say that if a PRG is predictable, then it is not secure. Unpredictability is the bar. PRG Predictability: A PRG is predictable if given some number of bits n of the PRG’s output, an attacker can predict what bit n + 1 would be.
There is a more formal definition, but it uses "negligible functions" which I have not completely understood.
Stream ciphers 2: Attacks and common mistakes
 Twotime pad:
 An attacker who gets a hold of two messages that have been encrypted using the same onetime pad can recover both messages.
 Related keys
 The example is WEP:
 The initialization vector was reset every 16 million frames or so, as well as every powerreset. This meant that there was a lot of opportunity for an attacker to observe twotime pads!
 Worse, the PRG used did not do well when the keys were closely related (which they were, the IV was just being incremented). And there was an attack on it which allowed the attacker to retrieve the secret key after recording some number of messages.
 The example is WEP:
 Ciphertext Integrity
 One problem with the onetime pad is that it does not guarantee integrity. An attacker could intercept the ciphertext, mess with it and send it along.
 This could be especially bad when the attacker knows something about the content of the ciphertext (maybe they for sure know the format of the original message). They could easily inject or modify the message in a way that is confusing at best, harmful at worst.
Stream ciphers 3: Realworld examples

RC4: Do not use this. However, this is still fairly widely used. Was used for WEP and is part of the reason the WEP wifi encryption is not secure.
 It’s weak because of the bias of its output. PRGs should be random, but RC4 has some weird biases.
 The probability of the first few bytes being 0 is 2/256. This means there is a higher probability that your first couple of bytes would not be encrypted at all(i.e 0 XOR anything doesn’t change anything)
 If you must use RC4 disgard the first 256 bytes it outputs.
 The probability of seeing two 0 bytes in a row is higher than you would imagine in a completely random process. This bias appears after a few Gigs of the generator, but this pattern makes it easy to distinguish from others.
 The probability of the first few bytes being 0 is 2/256. This means there is a higher probability that your first couple of bytes would not be encrypted at all(i.e 0 XOR anything doesn’t change anything)
 It’s weak because of the bias of its output. PRGs should be random, but RC4 has some weird biases.

CSS (Content Scrambling System): Used to encrypt DVDs
 Based on a Linear Feedback Shift Register (supposed to be easy to implement in hardware). It uses two LSFRs as its PRG
 Weaknesses:
 Small key size that makes it easy to bruteforce
 Format of movies are known, so in addition to the cipher text, the attacker might actually know what the initial plain text looks like.
XOR that with the actual cipher and you get the initial segment of the PRG. Brute force in a smart way and you can work out the initial settings of the PRG (i.e. the key)

eStream
 State of the art (for now). No significant attacks…for now. Seems to be unpredictable.
 Introduces another argument to the PRG called a nonce – a Number used Once
 It’s actually a group of stream ciphers/PRGs that satisfy various metrics (e.g. being easy to implement on hardware vs software)
 One of such ciphers is called Salsa20
Stream ciphers 4: What is a secure cipher?
Security Definitions:
A statistical test is a small function that outputs 0 or 1 based on an observed property of its input. In the context of a pseudorandom generator (PRG), we can write statistical tests that represent heuristics about what we consider a truly random output. The statistical test will output 1 if it thinks the input is random, other wise it will output 0.
To get a sense of how well a PRG performs, we need to compare it to a truly random output. Formally, this comparison is called the "Advantage of the statistical test relative to the generator".
A generator is secure if for all "efficient" statistical tests, the advantage negligible.
This definition is a pie in the sky though because we can’t actually prove that a particular PRG is secure (probably because there is no way we can list "all" possible statistical tests). I don’t know about you, but this is a disappointing turn of events. We went through all the trouble of formally defining a secure generator only to discover that we can’t actually create one. We can only come closer and closer to one by adding new statistical tests.
Two (useless?) facts (since we can’t actually get our hands on a secure generator)
 A secure generator is unpredictable
 An unpredictable generator is secure
 If nextbit predictors cannot distinguish the generator from random, then no statistical test can.
Two probability distributions P1 and P2 over the set of all nbit strings are computationally indistinguishable if for all "efficient" statistical tests, the advantage of P1, relative to P2 is negligible.
Semantic Security
Semantic security is a weaker form of perfect secrecy.
Recall that perfect secrecy requires that the probability distribution of encrypting m0 and getting cipher text c be the same as the probability distribution of encrypting m1 and getting the same cipher text c
Semantic security (under a onetime key) says that for all pairs of plain text m0, m1 that the attacker can come up with, the probability distribution of encrypting m0 and getting cipher text c is computationally indistinguishable from the probability distribution of encrypting m1 and getting the same cipher text c
The professor later goes on to show that given a secure PRG G, a stream cipher that uses G is semantically secure (under a onetime key)
Closing notes
Mixing formal mathematical definitions with cryptography is interesting. On one hand you get precise definitions of the goals that cryptographic primitives need to achieve. But on the other hand, it is not yet clear how to 100% achieve these primitives.
So far the concepts of a secure PRG and computational indistinguishability for example, seem like they are out of reach because they require coming up with an exhaustive list of all efficient statistical tests. Practically, there is no way to prove that we have the complete list. This seems to mean that the security of practical cryptographic primitives is temporary. It’s secure until someone comes up with a statistical test with nonnegligible advantage.