Axel Herrmann
Axel Herrmann

Reputation: 119

PDFBox Signatures: Get range of pages which are covered by signature

I'm using PDFBox to retrieve information about signatures of PDF files:

for (PDSignature signature : document.getSignatureDictionaries()) {
  // retrieve information from signature
}

Is it possible to determine which pages of the PDF document are covered by a signature?

I've seen that it's possible to determine on which page the signature starts: https://stackoverflow.com/a/22132921/19199839
And I've seen that it's possible to determine whether it covers the whole document: https://stackoverflow.com/a/58102825/19199839

But I'm not sure if/how it's possible to determine the page range of a signature. Is there any way i.e. to get that information from the byteRange)?

Upvotes: 0

Views: 433

Answers (1)

mkl
mkl

Reputation: 95918

To sum up the comments...

But I'm not sure if/how it's possible to determine the page range of a signature.

First of all you only talk about signatures. Evidently you are talking about digital signatures in PDFs. Furthermore, I assume you talk about the interoperable kind, i.e. those following the explicit signature schemes defined in the PDF specification, ISO 32000-2.

So indeed, the bytes ranges given by the signature dictionary ByteRange entry describe the sections of the PDF signed by the signature. It is specified as an array of pairs of integers (starting byte offset, length in bytes) that shall describe the exact byte range for the digest calculation. Multiple discontiguous byte ranges shall be used to describe a digest that does not include the signature value (the Contents entry) itself.

This specification appears to allow quite arbitrary byte ranges which might only include some pages or even only some parts of some pages.

Actually, though, the specification continues and says If SubFilter is ETSI.CAdES.detached or ETSI.RFC3161, the ByteRange shall cover the entire PDF file, including the signature dictionary but excluding the Contents value.

Thus, in case of real PAdES signatures and document time stamps, the byte range must be the whole signed revision except the signature container placeholder.

The specification also says in general that This range should be the entire PDF file, including the signature dictionary but excluding the signature value itself (the Contents entry). While this is only a recommendation ("should"), the omnipresent validator Adobe Acrobat even for non-PAdES signatures requires this.

Thus, a valid signature always covers the whole revision - all pages - of the PDF in which this signature has been added, in case of PAdES signatures by specification, otherwise by decision of the predominant validator.


Of course it is possible to change a signed PDF without cryptographically breaking the signature by appending changes in incremental updates.

But while in theory such changes could arbitrarily change the document contents, the initial signature of a PDF as an author signature can restrict the type of changes allowed in such incremental updates. If you take a look at the details, you'll see that adding arbitrary pages or removing them can never be allowed by these author signatures.

And even if the first signature is not such an author signature but a mere standard approval signature, the predominant validator, Adobe Acrobat, assumes permissions similar to the most lax author signature type, also not allowing arbitrary page additions or removals. For details see this answer.

Thus again, in a PDF with a valid signature all pages are signed, either by spec (in case of the presence of an author signature) or by decision of the predominant validator.


Actually there is one way to be allowed to add pages to a document: PDFs may contain so-called page templates, pages that are not shown; such templates can become visible by spawning the template. If the template was already part of the originally signed file, spawning it is allowed in an already signed PDF as long as the author signature allows form fill-ins.

As those templates are in the originally signed PDF revision, these template pages can already be considered signed. Thus, even with spawned templates, in a way all visible pages in a document are signed.

But in a more strict interpretation, only considering "visible" pages, there may be more pages in a signed PDF with later incremental updates than in the originally signed revision. But this information cannot be gathered from the original byte range information, you have to analyze the additions in the incremental updates and check whether there are any additional visible pages that were not visible before signing.

Upvotes: 3

Related Questions