Skip to main content

Advanced PDF notes

The PDF document represents a true flag to capture from the perspective of a forensics examiner. As detailed in How you will not uncover Satoshi, the file contains a unique ID that's essentially an MD5 digest of some known and unknown metadata components in the document. One of the unknowns represented by that MD5 digest is the original document path, which could potentially contain an OS username, and thus a significant clue to the Nakamoto persona's true identity.

Regardless of the correct and complete document authoring workflow, there is no doubt about some facts around the creation of this PDF file:

  • OpenOffice.org Writer version 2.4 generated the PDF file. While this software was likely also used to export the content to a PDF file, there is still a possibility that the author didn't write the paper in this software, and that this software just served as an intermediary to convert the file from another format (for example, a LaTeX file authored elsewhere) into PDF.

  • Nakamoto used a Windows XP PC

ID string as identity oracle

The original PDF contains an ID string that OpenOffice.org generates from an MD5 digest of document metadata items. Some of these items are known, and some are unknown.

One can't reverse the digest to reveal the values behind the digest, but one could create similar digests from the known items with guesses for the unknowns, and compare to different replacement values for the unknowns to confirm an OS username value as part of the document original filesystem path.

Known valuesUnknown values
TitleAuthor
CreatorSubject
ProducerKeywords
Document creation date

Direct document authoring and release workflow

It's plausible that the person who authored the paper did so entirely on a Windows XP system using the OpenOffice.org Writer software:

  1. Original document authored in OpenOffice.org Writer.

  2. OpenOffice.org document exported as PDF.

  3. PDF document published and shared

Alternate document authoring and release workflow

If the person who authored the paper was an academic, then there exists a strong potential authoring workflow that involves TeX/LaTeX due to their popularity in academia:

  1. Original document authored in TeX/LaTeX

  2. Original document imported into OpenOffice.org

  3. OpenOffice.org document exported as PDF

  4. PDF document published and shared

Document ID value source code

The following is an abbreviated snippet of code from OpenOffice version 2.4 that shows precisely how it generates the document ID value.

OStringBuffer aID( 1024 );
if( m_aDocInfo.Title.Len() )
appendUnicodeTextString( m_aDocInfo.Title, aID );
if( m_aDocInfo.Author.Len() )
appendUnicodeTextString( m_aDocInfo.Author, aID );
if( m_aDocInfo.Subject.Len() )
appendUnicodeTextString( m_aDocInfo.Subject, aID );
if( m_aDocInfo.Keywords.Len() )
appendUnicodeTextString( m_aDocInfo.Keywords, aID );
if( m_aDocInfo.Creator.Len() )
appendUnicodeTextString( m_aDocInfo.Creator, aID );
if( m_aDocInfo.Producer.Len() )
appendUnicodeTextString( m_aDocInfo.Producer, aID );
...
aID.append( m_aCreationDateString.getStr(), m_aCreationDateString.getLength() );
aInfoValuesOut = aID.makeStringAndClear();
osl_getSystemTime( &aGMT );
rtlDigestError nError = rtl_digest_updateMD5( m_aDigest, &aGMT, sizeof( aGMT ) );
if( nError == rtl_Digest_E_None )
nError = rtl_digest_updateMD5( m_aDigest, m_aContext.URL.getStr(), m_aContext.URL.getLength()*sizeof(sal_Unicode) ); // unicode value
if( nError == rtl_Digest_E_None )
nError = rtl_digest_updateMD5( m_aDigest, aInfoValuesOut.getStr(), aInfoValuesOut.getLength() );