Anon
Anon

Reputation: 431

How do PDF readers validate form fields?

I was looking at the source code of several pdf files which were digitally signed (and had annotations and form fields as well).

I noticed that each "Annot" dictionary has a "M" value which stores the latest time it was modified - which can then be checked with the "M" value for the "Sig" dictionary which stores when the pdf file was digitally signed.

However, I noticed that dictionaries with type "XObject" and subtype "Form" do not have an "M" value - i.e. do not store the time at which said form was modified. In such cases, how do pdf readers validate whether a change to form field is allowed (for eg, in a digital sign where no changes are allowed, no form field can be changed after the digital sign is done - how is this verified?)?

I just attached an example pdf at this link:

https://www.mediafire.com/file/q8ed9rkf35kgxgq/output.txt/file

Upvotes: 4

Views: 307

Answers (1)

mkl
mkl

Reputation: 96064

Some Misconceptions

There apparently are a number of misconceptions to clear up here.

I noticed that each "Annot" dictionary has a "M" value which stores the latest time it was modified - which can then be checked with the "M" value for the "Sig" dictionary which stores when the pdf file was digitally signed.

  • The M entry of annotation dictionaries is optional, so you cannot count on it being there.
  • The format of the annotation M value essentially only is a String; it is merely recommended to contain dates as specified in the PDF specification but not required, so you might find a value like "my mother's 42nd birthday" in it.
  • The annotation M value is not backed by a digital timestamp, so a forger could put anything there.

Furthermore, the M entry of a signature dictionary also is optional, and by itself it also cannot be trusted.

Thus, no, this represents no means to validate anything.

However, I noticed that dictionaries with type "XObject" and subtype "Form" do not have an "M" value - i.e. do not store the time at which said form was modified. In such cases, how do pdf readers validate whether a change to form field is allowed (for eg, in a digital sign where no changes are allowed, no form field can be changed after the digital sign is done - how is this verified?)?

First of all, as explained above, the M values cannot be used at all for modification detection, so whether some objects do or do not have them, is irrelevant.

Furthermore, a form XObject by itself is not a form field meant by the document modification detection and prevention settings of a signature. The form fields meant are AcroForm form fields (or, in a deprecated special case, XFA form fields). A form XObject may be used as appearance of an AcroForm form field but in that case the pivotal check point is the form field itself.

How to Validate Changes

(For some backgrounds on PDF signatures you may want to read this answer first.)

Depending on the document modification detection and prevention (DocMDP) settings of the signatures of a document only certain changes are allowed to a document, see this answer.

But even the allowed changes may not be applied by changing the original objects in the PDF. That would after all change the digest over the originally signed byte ranges and so invalidate the signature. Thus, the changed and added objects are appended to the PDF, capped off by a cross reference table or streams for these objects.

Thus, what you have to do for DocMDP validation, is determining the extend of the signed revision in the PDF file, finding out that way what has been appended, and analyzing whether those additions change the signed revision in allowed or disallowed ways.

While this may sound simple at first, it is not, in particular because "allowed" and "disallowed" changes are characterized by their effects on document appearance and behavior, not by the actual PDF objects that may be affected.

Here currently ETSI working groups are attempting to transform those characterizations into criteria for PDF objects; the results are to be published as ETSI TS 119 102-3, probably in multiple parts.

Some Details

In comments you asked

how do you tell from the appearance of a modified object, whether it was added before or after the digital sign?

Well, as mentioned above:

  • First you determine the extend of the signed revision in the PDF file.

    I.e. you take the ByteRange entry of the signature dictionary and take the start of the lower range and the end of the higher range. E.g. if that entry is

    [ 0 67355 72023 6380 ] 
    

    the the signed revision starts at offset 0 and ends at offset 72023+6280-1=78302 inclusively.

    (Obviously some sanity checks are indicated, in particular that the start offset is 0, that the gap in the signed byte ranges exactly contains the signature dictionary Contents value, that the signed revision as a whole is a valid PDF and all its cross references point to indirect objects completely contained in that signed revision, and that that signed revision indeed is a previous revision of the whole PDF, i.e. that the chain of cross reference streams or tables contains the cross reference stream or table of that revision.)

  • Then you find out that way what has been appended.

    I.e. you compare the cross reference stream or table of the whole file and the cross reference stream or table of the signed revision.

    If some object is referenced for an object number now but was not referenced for that object number in the signed revision, you found a change to check.

    (Strictly speaking you should iterate along the chain of cross reference streams or tables from the signed revision to the whole file, i.e. revision by revision, and check the changes in each revision.)

For this procedure you obviously have to use the original file, not some version uncompressed by tools like qpdf, otherwise you cannot do the offset tests.

Is it possible for an attacker to add a new annotation object before the xref table corresponding to the digital signature, and adjust the previous xref table values, so that a broken document passes as an accepted document?

No. The signed revision including its cross references is covered by the signature. Manipulating those bytes will invalidate the signature mathematically.

Upvotes: 3

Related Questions