Course notes: Cryptography I (Week 1)

Notes on the Online Cryptography Course from Stanford.
https://crypto.stanford.edu/~dabo/courses/OnlineCrypto/

Table of Contents

  1. What is cryptography about?
  2. Crash course in discrete probability
  3. Stream ciphers 1: The one-time pad and stream ciphers
  4. Stream ciphers 2: Attacks and common mistakes
  5. Stream ciphers 3: Real-world examples
  6. Stream ciphers 4: What is a secure cipher?
  7. Closing notes

What is cryptography about?

What is cryptography

Cryptography is more than just encryption and decryption.

Cryptography is about (a) secure key establishment and (b) secure communications.

Secure key establishment amounts to Alice and bob agreeing on a shared key in such a way that an attacker has no idea what this key might be.

Secure communications means that:

  • Alice and Bob know that they are talking to each other
  • Their messages are confidential — no one can snoop in on what they are saying.
  • Their messages cannot be tampered with

With the core out of the way, we can do some more interesting stuff.

  • Digital signature: How can we make the equivalent of physical signatures in digital world while making sure an attacker cannot just copy and paste our signature elsewhere?
  • Anonymous communication: How can Alice interact with a server without the server knowing anything about who interacted with it? (e.g. mix net)
  • Anonymous digital cash: How can we bring the anonymity of cash into the digital world while making sure people won’t double spend?
  • Secure multi-party computation
    • Given some input from different parties, how do you compute a result without revealing anything else about the inputs? This has applications in elections for example.
    • There is a theorem: "Anything that can be done with a trusted authority can also be done without"
  • Crypto Magic!
    • Privately outsourcing computation
    • Zero-knowledge proof-of-knowledge
      • There are ways to prove that you have a solution to a problem, without revealing anything about the solution itself.

Cryptography will be approached in the following way:

  • Precisely define the threat model: Who is the attacker? What are they trying to do? What are their goals? Define
  • Propose a construction
  • Prove that if an attacker is attacking our construction under our threat model, their attack can be used to attack an underlying unsolvable/hard problem. Basically, we are outsourcing the proof of security to another field (like mathematics). Nice.

History of cryptography

THE history of cryptography: The Code Breakers by David Kahn

Crash course in discrete probability

Cryptographic constructs are always accompanied by a proof of security.

Discrete probability is used as the language of such proofs.

The "discrete" part simply means that the number of possible things that can happen can be counted, it is not infinite.

This is in contrast to continuous probability which is it’s own beast that I hopefully will not be coming across in this course.

Visual explaining some probability concepts
Another visual introducing probability concepts. Specifically, random variables
  • Event independence
    • Events A & B are independent if P[A and B] = P[A] * P[B]
  • XOR is important because of an odd? property it has:
    • Given a uniform random variable X, if you XOR it with another random variable Y (does not matter what it is), the result is always a uniform random variable

Stream ciphers 1: The one-time pad and stream ciphers

Information theoretic security and the one-time pad

So what is a cipher? A cipher is a set of two functions, encryption and decryption.
Encryption takes a key and some message and produces cipher-text
Decryption takes a key and cipher-text and produces a message

What makes a secure cipher ?

The father of information Claude Shannon has our backs here.

In plain English:
The cipher text should reveal no information about the plain text.

In the language of probability:
For every pair of messages m0 and m1, the probability of encrypting m0 and getting a cipher text c should be the same as the probability of encrypting m1 and getting that same cipher text c.

This makes sense intuitively. You shouldn’t be able to use such things as letter frequency to analyze how likely it is that certain cipher text symbols correspond to plain text.

A cipher that meets this requirement is said to have perfect secrecy. It leaks no information. It tells no tales.

  • The One Time Pad (Vernam 1917)
    • the key is a random bit string as long as the message
    • encryption is XORing the key with the message.
    • decryption is XORing the key and the cipher-text
    • Not practical because you need a way to securely communicate the key beforehand. But if you have a way to do that, then you could use it to transmit the message too.
    • Given our above definition, the one-time pad has perfect secrecy.
      • This is not the end of the story though. Perfect secrecy only guarantees that the cipher is not vulnerable to cipher text only attacks.

Claude Shannon also proved (unfortunately) that a perfectly secure cipher must have keys that are at least as long as the messages.
The one-time-pad satisfies this, but this certainly throws a wrench in things.

Stream ciphers and pseudo random generators

The stream cipher is an attempt to make a usable one-time-pad implementation.

It works like so:

  • The key is some reasonable n-bit number
  • The encryption process takes that key and uses it as a seed for a Pseudo-random generator that produces a value as long as the message
  • That value is then used to encrypt the message.

Stream ciphers does not pass our perfect secrecy test. The key is not the same size as the message.

A different metric for a cipher’s security is needed that can help characterize how secure stream ciphers are.

We say that if a PRG is predictable, then it is not secure. Unpredictability is the bar. PRG Predictability: A PRG is predictable if given some number of bits n of the PRG’s output, an attacker can predict what bit n + 1 would be.

There is a more formal definition, but it uses "negligible functions" which I have not completely understood.

Stream ciphers 2: Attacks and common mistakes

  • Two-time pad:
    • An attacker who gets a hold of two messages that have been encrypted using the same one-time pad can recover both messages.
  • Related keys
    • The example is WEP:
      • The initialization vector was reset every 16 million frames or so, as well as every power-reset. This meant that there was a lot of opportunity for an attacker to observe two-time pads!
      • Worse, the PRG used did not do well when the keys were closely related (which they were, the IV was just being incremented). And there was an attack on it which allowed the attacker to retrieve the secret key after recording some number of messages.
  • Cipher-text Integrity
    • One problem with the one-time pad is that it does not guarantee integrity. An attacker could intercept the cipher-text, mess with it and send it along.
    • This could be especially bad when the attacker knows something about the content of the cipher-text (maybe they for sure know the format of the original message). They could easily inject or modify the message in a way that is confusing at best, harmful at worst.

Stream ciphers 3: Real-world examples

  • RC4: Do not use this. However, this is still fairly widely used. Was used for WEP and is part of the reason the WEP wifi encryption is not secure.

    • It’s weak because of the bias of its output. PRGs should be random, but RC4 has some weird biases.
      • The probability of the first few bytes being 0 is 2/256. This means there is a higher probability that your first couple of bytes would not be encrypted at all(i.e 0 XOR anything doesn’t change anything)
        • If you must use RC4 disgard the first 256 bytes it outputs.
      • The probability of seeing two 0 bytes in a row is higher than you would imagine in a completely random process. This bias appears after a few Gigs of the generator, but this pattern makes it easy to distinguish from others.
  • CSS (Content Scrambling System): Used to encrypt DVDs

    • Based on a Linear Feedback Shift Register (supposed to be easy to implement in hardware). It uses two LSFRs as its PRG
    • Weaknesses:
      • Small key size that makes it easy to brute-force
      • Format of movies are known, so in addition to the cipher text, the attacker might actually know what the initial plain text looks like.
        XOR that with the actual cipher and you get the initial segment of the PRG. Brute force in a smart way and you can work out the initial settings of the PRG (i.e. the key)
  • eStream

    • State of the art (for now). No significant attacks…for now. Seems to be unpredictable.
    • Introduces another argument to the PRG called a nonce – a Number used Once
    • It’s actually a group of stream ciphers/PRGs that satisfy various metrics (e.g. being easy to implement on hardware vs software)
    • One of such ciphers is called Salsa20

Stream ciphers 4: What is a secure cipher?

Security Definitions:

A statistical test is a small function that outputs 0 or 1 based on an observed property of its input. In the context of a pseudo-random generator (PRG), we can write statistical tests that represent heuristics about what we consider a truly random output. The statistical test will output 1 if it thinks the input is random, other wise it will output 0.

Visual of a statistical test

To get a sense of how well a PRG performs, we need to compare it to a truly random output. Formally, this comparison is called the "Advantage of the statistical test relative to the generator".

Visual of the advantage of a statistical test

A generator is secure if for all "efficient" statistical tests, the advantage negligible.

This definition is a pie in the sky though because we can’t actually prove that a particular PRG is secure (probably because there is no way we can list "all" possible statistical tests). I don’t know about you, but this is a disappointing turn of events. We went through all the trouble of formally defining a secure generator only to discover that we can’t actually create one. We can only come closer and closer to one by adding new statistical tests.

Two (useless?) facts (since we can’t actually get our hands on a secure generator)

  • A secure generator is unpredictable
  • An unpredictable generator is secure
    • If next-bit predictors cannot distinguish the generator from random, then no statistical test can.

Two probability distributions P1 and P2 over the set of all n-bit strings are computationally indistinguishable if for all "efficient" statistical tests, the advantage of P1, relative to P2 is negligible.

Semantic Security

Semantic security is a weaker form of perfect secrecy.

Recall that perfect secrecy requires that the probability distribution of encrypting m0 and getting cipher text c be the same as the probability distribution of encrypting m1 and getting the same cipher text c

Semantic security (under a one-time key) says that for all pairs of plain text m0, m1 that the attacker can come up with, the probability distribution of encrypting m0 and getting cipher text c is computationally indistinguishable from the probability distribution of encrypting m1 and getting the same cipher text c

The professor later goes on to show that given a secure PRG G, a stream cipher that uses G is semantically secure (under a one-time key)

Closing notes

Mixing formal mathematical definitions with cryptography is interesting. On one hand you get precise definitions of the goals that cryptographic primitives need to achieve. But on the other hand, it is not yet clear how to 100% achieve these primitives.

So far the concepts of a secure PRG and computational indistinguishability for example, seem like they are out of reach because they require coming up with an exhaustive list of all efficient statistical tests. Practically, there is no way to prove that we have the complete list. This seems to mean that the security of practical cryptographic primitives is temporary. It’s secure until someone comes up with a statistical test with non-negligible advantage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s