PDF file
These are notes on research into the original Bitcoin whitepaper PDF file.
The file used in these notes is from https://bitcoin.org/bitcoin.pdf.
Basic document facts and structure
Satoshi Nakamoto released the original Bitcoin whitepaper as a PDF document. I've used open source tools and techniques1 to analyze the facts and structure in this document.
Use the SHA-256 hash summary value to check the file integrity:
SHA-256 summary for bitcoin.pdf
:
b1674191a88ec5cdd733e4240a81803105dc412d6c6708d53ab94fc248f4f553
PDF structure analysis:
pdfid.py bitcoin.pdf
PDFiD 0.2.8 bitcoin.pdf
PDF Header: %PDF-1.4
obj 67
endobj 67
stream 23
endstream 23
xref 1
trailer 1
startxref 1
/Page 9
/Encrypt 0
/ObjStm 0
/JS 0
/JavaScript 0
/AA 0
/OpenAction 1
/AcroForm 0
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 0
/URI 0
/Colors > 2^24 0
This version of the whitepaper shows 1 /OpenAction
. An open action is any action, such as a request to an external website, that triggers when you open the file.
Examine the document for stream objects to find that the /OpenAction
is object 66:
pdf-parser.py -a -O bitcoin.pdf
Comment: 3
XREF: 1
Trailer: 1
StartXref: 1
Indirect object: 67
42: 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17, 18, 20, 21, 23, 24, 26, 27, 29, 30, 32, 34, 35, 37, 39, 40, 42, 44, 45, 47, 49, 50, 52, 54, 55, 57, 59, 60, 62, 64, 65, 67
/Catalog 1: 66
/Font 7: 33, 38, 43, 48, 53, 58, 63
/FontDescriptor 7: 31, 36, 41, 46, 51, 56, 61
/Page 9: 1, 4, 7, 10, 13, 16, 19, 22, 25
/Pages 1: 28
Search keywords:
/OpenAction 1: 66
Examine stream object 66:
pdf-parser.py -o 66 bitcoin.pdf
obj 66 0
Type: /Catalog
Referencing: 28 0 R, 1 0 R
<<
/Type /Catalog
/Pages 28 0 R
/OpenAction [1 0 R /XYZ null null 0]
/Lang (en-GB)
>>
This is an explicit destination stream object included in all OpenOffice documents to present the first page of the document with desired settings.
There's nothing to worry about here, although the reference to the en-GB language for the document is certainly interesting. Whether it's a genuine reflection of the operating environment at time of document authoring or a clever ruse is unknown.
Document metadata
When researching at the metadata level, some critical document properties emerge, namely details about the document creation. These details include the software and version along with creation date.
This page presents some of those details in plain text followed by their actual representation in the document, which sometimes includes hexadecimal encoding.
You can view and navigate through the entire document properties with:
pdfparser.py bitcoin.pdf | less
PDF comments
If you browse the PDF data, the first thing you'll note are 2 PDF comments. The first is just the PDF version:
PDF Comment '%PDF-1.4\n'
Then there's another PDF comment as a hex dump value:
PDF Comment '%\xc3\xa4\xc3\xbc\xc3\xb6\xc3\x9f\n'
This comment decodes to the UTF-8 characters plus a new line:
äüöß
It's unclear what this selection of German umlaut characters means, if anything.
A final comment value occurs at the end of the document:
PDF Comment '%%EOF\n'
The PDF comment appears just to be an EOF and end of line character: '%%EOF\n'
, or effectively an empty comment value.
Creation date
The /CreationDate
in the file is D:20090324113315-06'00'
.
This appears to be a Unix timestamp with timezone offset.
Here's how it breaks down:
D:
indicates the timestamp is in a Unix-like format.2009
is the year (2 digits, representing 20XX).0324
represents March twenty-fourth (03 = March, 24 = day of the month).11:33:15
specifies the time (11 hours, 33 minutes, and 15 seconds).
The timezone offset -06'00'
indicates an offset from UTC.
In this case:
-06
represents a timezone offset of -6 hours from Coordinated Universal Time (UTC).'00'
indicates no daylight saving time (DST).
Interpreted correctly, this would equate to:
2009-03-24 11:33:15 UTC-06
Creator
The original value is hex-encoded:
<FEFF005700720069007400650072>
and decodes to the English word Writer
.
It's not clear what 'Writer' means here, but the initial hunch is that it refers to OpenOffice Writer, a component of the OpenOffice suite. This suggests that tool is part of the authoring workflow. Of course, it could also just be a ruse to disguise the true authoring workflow.
More research into what can appear in this field by default from the software in question will help to push the topic further
Producer
The original value is hex-encoded:
<FEFF004F00700065006E004F00660066006900630065002E006F0072006700200032002E0034>
and decodes to OpenOffice 2.4
.
Document checksum
The document checksum represented in the PDF is: 6F72EA7514DFAD23FABCC7A550021AF7
.
ID field
The /ID
field is interesting, because it's an MD5 digest of concatenated key document metadata items covered earlier with some known and some unknown values:
- Document title:
- Document author:
unknown
- Document subject:
unknown
- Document keywords:
unknown
- Document creator:
Writer
- Document producer:
OpenOffice 2.4
- Document creation date:
D:20090324113315-06'00'
or2009-03-24 11:33:15 UTC-06
(or other variation)
/ID [<CA1B0A44BD542453BEF918FFCD46DC04><CA1B0A44BD542453BEF918FFCD46DC04>]
The advanced PDF forensics page details this further, including the source code responsible for generating this value.
exiftool abbreviated output
You can also use exiftool2 to process the file.
These are relevant details from the output of exiftool bitcoin.pdf
. They're pretty much the same results that pdfid.py
outputs, but the exiftool
has cleaner output:
File Size : 184 kB
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.4
Linearized : No
Page Count : 9
Language : en-GB
Creator : Writer
Producer : OpenOffice.org 2.4
Create Date : 2009:03:24 11:33:15-06:00
Python code example
You can also use code, like this Python example to fetch the PDF from its URL and process it directly. The output will also be like that produced by exiftool
or pdfid.py
.
pip3 install --quiet --quiet --user pdfplumber && \
cat > /tmp/bitcoin_paper.py << EOF
import io
import json
import pdfplumber
import requests
url = "https://bitcoin.org/bitcoin.pdf"
content = io.BytesIO(requests.get(url).content)
pdf = pdfplumber.open(content)
json_string = json.dumps(pdf.metadata, indent=4)
print(json_string)
EOF
python3 /tmp/bitcoin_paper.py
Exact output:
{
"Creator": "Writer",
"Producer": "OpenOffice.org 2.4",
"CreationDate": "D:20090324113315-06'00'"
}