PDF: Erroneous cross-reference stream accepted by applications

Question

Annotating in Drawboard PDF, it generated this strange cross-reference stream (unless I have misunderstood, I would say incorrect):

/W = [1, 2, 0]
Data = [2, 1, 183, 248, 2, 0, 1, 88, 2, 0, 3, 245, 2, 0, 0, 21, 2, 0, 2, 7]

From the reference:

A value of zero for an element in the W array indicates that the corresponding field is not present in the stream, and the default value is used, if there is one.

Each entry in the data should have 1 + 2 + 0 = 3 bytes but clearly this is not true - there are 5 entries with 4 bytes. No application I tried has problems opening the file, but I don't know how to deal with this in my PDF library.

iPDFdev · Accepted Answer

Since the first byte of each 4 byte groups is 2, I assume that your data has prediction applied to it. After decompression, if you apply the predictor, you will get 5 groups of 3 bytes.

Update: As I correctly assumed, the xref stream uses a predictor to improve compression (this is just a sample xref stream in your file as all the others follow the same pattern):

92 0 obj
<<
  /DecodeParms <>
  /Filter /FlateDecode
  /ID [<1EBBF34ADD340749DCDB9CA0F9F0F8F8> <1EBBF34ADD340749DCDB9CA0F9F0F8F8>]
  /Index [48 1 89 4]
  /Info 2 0 R
  /Length 28
  /Prev 46782
  /Root 1 0 R
  /Size 93
  /Type /XRef
  /W [1 2 0]>>
stream
  xœcbÜþƒ‰1‚‰ù+ƒ(; +ª*
endstream
endobj

After decompression you have 5 groups of 4 bytes. You have to apply the predictor otherwise the decompressed data is invalid. After applying the predictor you will get 5 groups of 3 bytes.
There is no error here regarding PDF libraries.

PDF: Erroneous cross-reference stream accepted by applications

Answers (1)

Related Questions