Sara
Sara

Reputation: 125

IText Extract original from SIGNED PDF and compare HASH

I have a signed PDF. The signature covers the entire documents and it's valid.

I want to extract the original pdf to compare its hash with that of the unsigned pdf.

I extract original pdf using the following code:

PdfReader reader = new PdfReader(FILESIGNED);
AcroFields acrofields = reader.getAcroFields();
//pdf have a unique signature
String signatureName = acrofields.getSignatureNames().get(0); 
FileOutputStream os = new FileOutputStream(FILEORIGINAL);
InputStream ip = acrofields.extractRevision(signatureName);
int n = 0;
byte bb[] = new byte[1028];
while ((n = ip.read(bb)) > 0)
    os.write(bb, 0, n);
os.close();
ip.close();
reader.close();

But the extracted pdf is not the same as the original. I would extract revision before signature? Is it possible?

Thanks for help. Sara

Upvotes: 4

Views: 2471

Answers (1)

mkl
mkl

Reputation: 95918

I want to extract the original pdf to compare its hash with that of the unsigned pdf.

In general this is not possible.

When iText (or other PDF signing libraries or applications) sign a PDF, they:

  1. add a signature form field to the PDF (unless an empty signature form field exists and is chosen for use in signing);
  2. add a dictionary object to the PDF with some signing related entries, in particular a big placeholder entry into which eventually a CMS signature container will be inserted; this dictionary is set as the value of the aforementioned form field;
  3. add a visualization to the form field, often containing some data from the signer certificate (unless the signature is chosen to be invisible);
  4. make some other form fields read-only if an empty signature for field with field lock information is signed;
  5. finalize the PDF, i.e. they set metadata like time-of-last-change and then write the finished PDF into a file or some byte array;
  6. calculate the hash value of the finished PDF excluding the value of the big placeholder but including all other changes made as described above;
  7. sign this hash value resulting in a CMS signature container;
  8. and put this signature container into the big placeholder.

Thus, in general the "original pdf" cannot be extracted anymore from the signed PDF file because the changes described above may have fundamentally changed the internal structure of the PDF.

There is one exception, though: If those changes were applied as an incremental update (in iText lingo: in append mode), it usually is possible to retrieve the original by cutting off that incremental update.

For this one merely has to search the latest end-of-file marker before the signature and cut off thereafter. (Actually there is a small amount of insecurity, a final end-of-line marker may or may not be part of the original PDF.)

Upvotes: 6

Related Questions