Reputation: 5834
Not really a programming question but I'm running out of ideas.
I generate this PDF file: http://www.o2sol.com/download/Sample_ObjectCompression_CryptBad.pdf
I can open the PDF file with any PDF viewer I tried (Chrome, Edge, Foxit, SumatraPDF, Xodo, etc) but Adobe Acrobat cannot open it, the error is "The file is damaged and it cannot be repaired."
Can somebody give me a hint what is wrong with the file?
Disclaimer: the PDF file is generated with the PDF4NET, the library I work on.
Update:
I fixed the offset for object 10 but the file still cannot be opened with Acrobat.
I created 2 updated files:
http://www.o2sol.com/download/Sample_ObjectCompression_CryptBad2.pdf - the file is just encrypted with RC4, no compression on the object stream or xref stream
http://www.o2sol.com/download/Sample_ObjectCompression_NoCrypt2.pdf - the file is not encrypted, no compression on the object stream or xref stream. The encrypt object has been replaced by document information to keep the same object numbers and offsets.
Both files have the same xref stream and object stream. CryptBad2 still cannot be opened by Acrobat so I suspect it's an encryption problem, although if I encrypt the file but drop the object compression, the file is opened without problems with Acrobat.
Upvotes: 0
Views: 451
Reputation: 1215
The problem appears to lie in the Object Stream's stream data, it appears to not be encoded correctly. Attempting to decode it produces no data, perhaps something is going awry with the Flate encoding process.
The PDF Library finds the same problem when looking for objects in the object stream and raises an error (which is likely the same problem that manifests in Acrobat since that's what Acrobat uses to open the document).
There appears to also be some (11 bytes) of junk at offset 0x0A, just after the header and just before object ID 1:
25 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8 D8
(Perhaps meant to be a comment.)
Upvotes: 0
Reputation: 5058
Object 10 (the cross-reference stream itself) has no valid entry in itself. Its fields are:
01 00 00 00
Which means (by W [1 2 1]) that it is located at offset 0 which is wrong (for sure).
Upvotes: 2
Reputation: 208
Acrobat is likely attempting to read the entire object stream before opening the file whereas the viewing tools mentioned might get to a partial read and allow the partial display.
Upvotes: 0