Writing a streamed cross-reference in PDF: file is detected as damaged?

Question

I am using Typescript to write a streamed cross-reference table at the end of my PDF file, as suggested the answer from my previous post

Here is the XREF table I would have written without stream:

xref
0 1
0000000000 65535 f 
21 2
0000084670 00000 n 
0000085209 00000 n 
73 6
0000085585 00000 n 
0000086335 00000 n 
0000150988 00000 n 
0000151086 00000 n 
0000151528 00000 n 
0000151707 00000 n 
trailer
<<
/Size 79
/Root 21 0 R
/Info 19 0 R
/Prev 116
>>
startxref
152861
%%EOF

And here is the streamed version:

79 0 obj
<<
/Type /XRef /Filter /FlateDecode
/Root 21 0 R
/Info 19 0 R
/Index [0 1 21 2 73 7 ] /W[1 3 0] /DecodeParms<> /Prev 116 /Length 45 /Size 80>>
stream
(...data..)
endstream
endobj
startxref
152870
%%EOF

As for the content of the stream, here it is in a byte array form. It was deflated using Poko.deflate() with the default compression level:

120,156,99,98,0,2,38,70,70,175,125,76,140,12,76,210,64,130,177,2,196,122,7,36,254,244,2,9,134,36,144,216,46,16,107,51,144,96,105,2,0,141,44,6,140

I have reversed the process I have found here

The arranged inflated version is as follow:

02 00 00 00 00
02 01 01 4a be // = 84670    -> object 21 is at byte position 84670
02 01 00 02 1b // = 539      -> object 22 is at byte position 85209
02 01 00 01 78 // = 376      -> object 73 is at byte position 85585
02 01 00 02 ee // etc
02 01 00 fc 8d
02 01 00 00 62
02 01 00 01 ba
02 01 00 00 b3
02 01 00 04 82

However, when I try and open the resulting file, all I get is:

What am I missing? The most odd thing about this process is the leading "02" on each line (found in the referenced post here). However, even without it, the same problem seems to occur. What am I missing?

mkl · Accepted Answer

3 problems can be identified easily.

Incorrect startxref offset

Your startxref points to the start of the stream dictionary but it should point to the object number of the indirect object housing the stream.

Missing predictor bytes

You claim that the arranged inflated version is

02 00 00 00 00
02 01 01 4a be // = 84670    -> object 21 is at byte position 84670
02 01 00 02 1b // = 539      -> object 22 is at byte position 85209
02 01 00 01 78 // = 376      -> object 73 is at byte position 85585
02 01 00 02 ee // etc
...

But inflating your stream gives

00 00 00 00
01 01 4a be
01 00 02 1b
01 00 01 78
01 00 02 ee
...

I.e. you forgot to add the predictor bytes.

Prediction only applied to offsets

You fixed the two errors mentioned above and shared that file. Looking at that file another error became clear: You only apply the prediction to the offset part of the cross reference entry, not to the initial byte indicating the type of the entry!

Your inflated stream is now as follows

 02, 01, 01, 4a, be,
 02, 01, 00, 02, 1b,
 02, 01, 00, 01, 78,
 02, 01, 00, 02, ee,
 02, 01, 00, fc, 8d,
 02, 01, 00, 00, 62,
 02, 01, 00, 01, bc,
 02, 01, 00, 00, b3,
 02, 01, 00, 04, 82

Resolving the prediction, therefore, results in

 01, 01, 4a, be,
 02, 01, 4c, d9,
 03, 01, 4d, 51,
 04, 01, 4f, 3f,
 05, 01, 4b, cc,
 06, 01, 4b, 2e,
 07, 01, 4c, ea,
 08, 01, 4c, 9d,
 09, 01, 50, 1f

This contains a lot of incorrect type bytes.

Writing a streamed cross-reference in PDF: file is detected as damaged?

Answers (1)

Incorrect startxref offset

Missing predictor bytes

Prediction only applied to offsets

Related Questions