tschmit007
tschmit007

Reputation: 7800

reading a pdf version >= 1.5, how to handle Cross Reference Stream Dictionary

I'm trying to read the xref table of a pdf version >= 1.5.

the xref table is an object:

58 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<CB05990F613E2FCB6120F059A2BCA25B><E2ED9D17A60FB145B03010B70517FC30>]/Index[38 39]/Info 37 0 R/Length 96/Prev 67529/Root 39 0 R/Size 77/Type/XRef/W[1 2 1]>>stream
hÞbbd``b`:$AD`­Ì ‰Õ Vˆ8âXAÄ×HÈ$€t¨  – ÁwHp·‚ŒZ$ìÄb!&F†­ .#5‰ÿŒ>(more here but can't paste)
endstream
endobj

as you can see

BUT :

the decompressed stream is 195 bytes long (39 * 5 = 195). So the length of an entry is 4 or 5.

Here is the first inflated bytes

02 01 00 10 00 02 00 02 cd 00 02 00 01 51 00 02 00 01 70 00 02 00 05 7a 00 02
            ^^

if entry length is 4 then the root entry is a free object (see the ^^) !!

if the entry is 5: how to interpret the fields of one entry (reference is implicitly made to PDF Reference, chapter 3.4.7 table 3.16 ) ?

For object 38, the first of the stream: it seems, as it is of type 2, to be the 16 object of the stream object number 256, but there is no object 256 in my pdf file !!!

The question is: how shall I handle the 195 bytes ?

Upvotes: 8

Views: 4130

Answers (1)

Jongware
Jongware

Reputation: 22457

A compressed xref table may have been compressed with one of the PNG filters. If the /Predictor value is set to '10' or greater ("a Predictor value greater than or equal to 10 merely indicates that a PNG predictor is in use; the specific predictor function used is explicitly encoded in the incoming data")1, PNG row filters are supplied inside the compressed data "as usual" (i.e., in the first byte of each 'row', where the 'row' is of the width in /W).

Width [1 2 1] plus Predictor byte:

02 01 00 10 00
02 00 02 cd 00
02 00 01 51 00
02 00 01 70 00
02 00 05 7a 00
02 .. .. .. ..

After applying the row filters ('2', or 'up', for all of these rows), you get this:

01 00 10 00
01 02 ed 00
01 03 3e 00
01 04 ae 00
01 09 28 00
.. .. .. ..

Note: calculated by hand; I might have made the odd mistake here and there. Note that the PNG 'up' filter is a byte filter, and the result of the "up" filter is truncated to 8 bits for each addition.

This leads to the following Type 1 XRef references ("type 1 entries define objects that are in use but are not compressed (corresponding to n entries in a cross-reference table)."):2

#38 type 1: offset 10h, generation 0
#39 type 1: offset 2EDh, generation 0
#40 type 1: offset 33Eh, generation 0
#41 type 1: offset 4AEh, generation 0
#42 type 1: offset 928h, generation 0

1 See LZW and Flate Predictor Functions in PDF Reference 1.7, 6th Ed, Section 3.3: Filters.

2 As described in your Table 3.16 in PDF Ref 1.7.

Upvotes: 14

Related Questions