iText 7.0.4.0 - PdfWriter produces corrupted PDF for certain PDF file inputs

Question

I'm having an issue where PdfWriter from iText 7.0.4.0 (.NET 4.5.1) produces corrupted PDF documents for certain input PDF files.

To elaborate, PDF files with well-formed paragraphs have no issues. However, if the input PDF contains irregular contents (for a lack of better words; please refer to the samples in Google drive), PdfWriter produces corrupted PDF files; by corrupted, I mean that the file can be opened, but it shows a blank page with extremely high zoom (in Adobe Reader XI). Corrupted samples have also been provided in the aforementioned Google drive link.

Sample code:

using (var pdfReader = new PdfReader("sample1_input.pdf"))
{
    PdfDocument pdfDoc = new PdfDocument(pdfReader, new PdfWriter("sample1_corrupted_output.pdf"));

    // Trying to highlight a part of PDF by referencing this example:
    // https://developers.itextpdf.com/examples/stamping-content-existing-pdfs/clone-highlighting-text
    // Commented out for now because PdfWriter is producing corrupted PDF documents for the samples and similar PDF files.
    //PdfCanvas canvas = new PdfCanvas(pdfDoc.GetFirstPage());
    //canvas.SetExtGState(new PdfExtGState().SetFillOpacity(0.1f));
    //canvas.SaveState();
    //canvas.SetFillColor(Color.YELLOW);
    //canvas.Rectangle(100, 100, 200, 200);
    //canvas.Fill();
    //canvas.RestoreState();

    pdfDoc.Close(); // Corrupted PDF file is produced, even without highlighting.
}

One "interesting" thing I noticed is that if I provide "new StampingProperties().UseAppendMode()" as the third parameter of PdfDocument (without the highlighting code), PdfWriter spits out the original file (although a few kb larger than the original for some reason). However, PdfWriter goes back to producing corrupted PDFs when the highlighting code is un-commented.

Link to sample files: https://drive.google.com/open?id=0B3NPOZswWocQV09KMW5fbFVyUm8 sample1_input.pdf (input sample #1) -> sample1_corrupted_output.pdf (corrupted output) sample2_input.pdf (input sample #2) -> sample2_corrupted_output.pdf (corrupted output)

Please kindly give some advice.

mkl · Accepted Answer

The cause of this corruption is an unusual structure of the page tree of the PDFs in question:

It is unusual in two ways:

It has a subtree without any page objects (dictionaries 17 an 21).
It has a node with mixed child node types (dictionary 10 has a Page child 3 and a Pages child 17)

If one removes the page-less subtree (by removing object 17 from the Kids of object 10), both quirks are removed and the code does not fail anymore.

While both quirks are weird, I don't see anything in ISO 32000-1 (unfortunately I don't have a copy of ISO 32000-2 yet) indicating that these unusual structures are explicitly forbidden. Thus, I would assume this is an iText bug.

I could reproduce the problem with iText 7.0.4 for Java but not the current development SNAPSHOT of 7.0.5.

Indeed, there is a commit dated 2017-09-19 10:03:37 [c0b35f0] described as "Fix bugs in pages tree rebuilding" with differences in the PdfPagesTree class in a code block described as "handle mix of PdfPage and PdfPages". Thus, the issue appears to be known and already fixed.

You may either wait for the 7.0.5 release or look for hotfixes 7.0.4.x.

iText 7.0.4.0 - PdfWriter produces corrupted PDF for certain PDF file inputs

Answers (1)

Related Questions