Distilled: How to Time-Stamp a Digital Document

In the oh-so-famous bitcoin white paper, Satoshi builds on the concept of a “timestamp server”. Since I have a lot of quarantine time on at hand, I thought I would try to understand what this is a bit better. In what follows, there will be no mention of bitcoin. Only a distillation of Haber and Stornetta’s paper on digital time-stamping that was referenced in the bitcoin white paper.

Why worry about time-stamping digital documents? I needed to co-sign a lease once. However, the actual signing had to be done with blue pen while video-conferencing with an agent. I did not have a blue pen at home (fight me), and they were adamant that a black pen just would not do. I had to visit them in person to perform the signing ritual. This was 2019.

Questions about why aside, I now turn to face how we time-stamp a digital document. The authors offer two approaches: one centralized and the other de-centralized. In a gesture that does not inspire confidence, they start off by saying that “It is not clear that either of these can be done at all”.

Criteria:

  • Once a document is time-stamped, it must be impossible to change it: If a change is made, it should be obvious that the new modified document is different from the original time-stamped one.
  • It must be impossible to time-stamp a document with a time and date other than the current one: Any attempt at providing false time stamps should be glaring.

Centralized Method

A visual representing the high level setup for a centralized time-stamping servicr

In the centralized version, we’ll have Time-Stamping Service (TSS) that will time-stamp our documents for us. Clients will send documents they want to time-stamp and the TSS will respond with a piece of data they can use to prove their document was time-stamped.

The first criterion is satisfied by having the clients hash their documents before sending them to the TSS. Hashing is a procedure by which you take some data as input and produce a hash for that input. Any change to the input data produces a change in the output hash. A hash is sometimes called a fingerprint because it serves the same purpose as a human fingerprint. Same fingerprint? Same person. Different fingerprint? Different person.

Visual representation of hashing

How should the TSS respond to these client requests? Obviously, the data it sends back should have a timestamp (that’s why the client sent stuff to the TSS in the first place). Less obviously, the response should also be cryptographically signed by the TSS so that the client can verify it actually came from the TSS. You can think of cryptographic signatures as the mathematically-backed, digital version of physical signatures. So the TSS’s response will look something like this:

Visual representation of the TSS reply to document time-stamping requests

To satisfy the second criterion, it feels like we need a way to constrain the TSS and only allow it to do certain things. But you can’t provably do that. An attacker could find a way to make the TSS execute modified code without the clients knowing. Instead of trying to stop the TSS from being malicious, the authors suggest modifying the TSS’s response in such a way that it would be easy to validate that it is working correctly. The TSS’s reponse will now include information about the previous request. So the TSS’s response now looks something like this:

How does this stop the TSS from back-dating or forward-dating? Well, the authors don’t make it super clear why each part is needed. The way I see it is: if each response contains information about the previous response, the responses end up forming a chain whose coherence can be validated. The coherence here is that the timestamps in the chain are ordered. Additionally, the authors have the TSS also returning the client id of the next person making a request. That way, you could theoretically go up to them and ask to see what the TSS gave them to make sure it properly links to your response. All this extra information lets you programmatically check the TSS’s work.

Decentralized Method

Everybody is a TSS. So when you want to time-stamp a document you send it to a random set of other destinations that are all TSSes. Your provably time-stamped data would be the concatenation of all the TSS responses.

Visual representing a distributed TSS setup

According to the authors, this works because if we properly pick a random set of TSSes to send our request to, it is very unlikely that all of them would be malicious. If any of them is malicious, it would be obvious because not all the responses will match. Furthermore, apparently this scheme does not require sending back the next client id. The way I interpret it, this is because the extra assurance we get that not all clients will be malicious lets us relax the need to validate that our time-stamped document is properly linked to by subsequent documents.

In Closing

And that’s it. I think it is fair to say this design is the direct ancestor of Satoshi’s blockchain. Cheers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s