Reputation: 140
Whenever I try to read a PDF file generated by Office, there seems to be an error:
com.itextpdf.kernel.PdfException: Append mode requires a document without errors, even if recovery is possible.
at com.itextpdf.kernel.pdf.PdfDocument.open
The example of file in question: https://drive.google.com/open?id=1fnwtXfEGg6BIeVuAi-l28Ol_dxbCd12F and a snip of the code I use to open it. My goal is to do a detached signature, this works fine for every file as long as it is not generated by MS Office.
PdfReader reader = new PdfReader(docPath);
StampingProperties properties = new StampingProperties();
properties.useAppendMode();
//This is where the error is thrown.
PdfSigner signer = new PdfSigner(reader, new FileOutputStream(outputPath), properties);
I have read this question which is basically the same problem: Append mode requires a document without errors, even if recovery is possible, tried what Lowagie suggests as for removing the bytes
xref
0 24
0000000000 65535 f
0000011981 00000 n
0000000239 00000 n
0000003212 00000 n
0000000022 00000 n
0000000220 00000 n
0000000343 00000 n
0000003176 00000 n
0000000000 00000 n
0000003345 00000 n
0000000440 00000 n
0000003155 00000 n
0000003295 00000 n
0000003863 00000 n
0000003519 00000 n
0000003843 00000 n
0000004099 00000 n
0000011737 00000 n
0000011758 00000 n
0000011803 00000 n
0000011877 00000 n
0000011900 00000 n
0000011942 00000 n
0000011961 00000 n
trailer
<< /Size 24 /Root 12 0 R /Info 1 0 R /ID [ <8e4b8658dd1d1f745bdf99a0eb05bb97>
<8e4b8658dd1d1f745bdf99a0eb05bb97> ] >>
startxref
12125
%%EOF
But my PDF complained and stopped working, also tried leaving the %%EOF but got the same result.
So two things:
1) Is there a fix for the bug discussed by Lowagie and MKL?
2) What could be a workaround for this problem?
Upvotes: 2
Views: 935
Reputation: 95918
First of all, the question you refer to is not about basically the same problem, you merely get the same error message. The PDF in question there is a hybrid reference PDF which your file is not: Your file has only a single cross reference table and a single trailer while the hybrid reference PDF has (at least) two cross reference tables each followed by a trailer, and the latter trailer has a XRefStm entry pointing to a cross reference stream. Hybrid reference PDFs are valid, iText 7 used to have problems with such PDFs and that was a bug.
Your PDF file, on the other hand, actually has an error itself: The cross reference table claims that object 8 is at file offset 0
xref
0 24
...
0000000000 00000 n
This cannot be true as at the start of the file there is the PDF header. Furthermore, the first object to come thereafter is object 4, so one cannot argue either that the first object following the offset is meant...
iText 7 only allows append mode if it hasn't found an issue in the source file. This is reasonable.
So if you in a reproducible manner get that error, you should file a bug with the PDF producer.
You claim that the PDF file was generated by MS Office. The metadata of your PDF, on the other hand, indicate that while MS Word is the creator of the document, the actual producer of the PDF is Quartz PDFContext. You may want to file an issue for Quartz PDFContext.
A work-around for you would be to catch this exception and try again without append mode.
Alternatively, if you really really really need to process these damaged files in append mode, you can make the PdfReader
lie about the found error by overriding hasRebuiltXref
with a method that always returns false
, e.g. by replacing
PdfReader reader = new PdfReader(SOURCE);
by
PdfReader reader = new PdfReader(SOURCE) {
@Override
public boolean hasRebuiltXref() {
return false;
}
};
(StampNoChange test testAppendTest
)
Be aware, though, that the result PDF still contains the error iText identified in the original file. Thus, any PDF processors further processing your PDF may also stumble, either the same way as iText originally did or in some other, probably spectacular way.
Upvotes: 2