JPuzzled
JPuzzled

Reputation: 1

Why is my XREF table corrupt after removing or adding annotations?

I am evaluating iText 7 as alternative for our current PDF processing packages. I am creating a PoC where the input is a PDF document with containers (i.e. annotations). Logic should remove existing QR codes and then add new ones in the containers.

While the logic seems to work and the codes are removed and/or added, there are still issues with the modified file. It states there are errors with the XREF table. The error is introduced also if I only remove or add a code, so I thought the cause might be in the common code.

There seems to be no issue when you open the modified file in the browser. When opening with a advanced PDF editor a warning is shown. Resaving the file with the editor seems to fix the file, but that is not a solution.

I wonder where I went wrong and what to do to fix this or investigate it further. With the code below it is easy to reproduce. Any thoughts on this?

I'm testing with iText 7.1.11 in a .NET Core console application written with C#.

Main code for the process:

using (var pdfFileStream = new MemoryStream(pdfFileBytes))
{
  using (var outputFileStream = new ByteArrayOutputStream())
  {
    var pdfReader = new PdfReader(pdfFileStream);
    var pdfWriter = new PdfWriter(outputFileStream);
    var pdfDocument = new PdfDocument(pdfReader, pdfWriter);

    var pageCount = pdfDocument.GetNumberOfPages();
    PdfPage page = null;
    IList<PdfAnnotation> annotations = null;
    for (int i = 0; i < pageCount; i++)
    {
      page = pdfDocument.GetPage(i + 1);
      annotations = page.GetAnnotations();

      if (annotations.Count == 0)
      {
        continue;
      }

      // Get QR code containers
      var containers = FilterAnnotations(annotations, containerName);

      if (containers.Count == 0)
      {
        continue;
      }

      // Get QR codes instances
      var instances = FilterAnnotations(annotations, instanceName);

      if (instances.Count > 0)
      {
         foreach (var instance in instances)
         {
            RemoveQRCode(instance, page);
         }
      }

      foreach (var container in containers)
      {
         AddQRCode(container, page, pdfDocument, url);
      }

    }

    pdfDocument.Close();

    return outputFileStream.ToArray();
  }
}

I filter out the annotations I'd like to remove (i.e. the codes) by subject. The same goes for finding the containers where to put the code annotations:

private List<PdfAnnotation> FilterAnnotations(IList<PdfAnnotation> annotations, string subject)
{
  return annotations.Where(a =>
    !string.IsNullOrEmpty(GetSubject(a)) &&
    GetSubject(a).ToLowerInvariant().Equals(subject, StringComparison.OrdinalIgnoreCase)).ToList();
}

Then I loop through the filtered annotations to remove them:

private void RemoveQRCode(PdfAnnotation annotation, PdfPage page)
{
  page.RemoveAnnotation(annotation);
}

The following code describes adding the codes:

private void AddQRCode(PdfAnnotation container, PdfPage page, PdfDocument document, string url)
{
  BarcodeQRCode qrCode = new BarcodeQRCode(url);
  PdfFormXObject qrCodeForm = qrCode.CreateFormXObject(ColorConstants.BLACK, document);

  var rectangle = GetRectangle(container);

  var canvas = new PdfCanvas(page);
  canvas
    .SaveState()
    .AddXObject(qrCodeForm, rectangle)
    .RestoreState();

  canvas.Release();

  PdfStampAnnotation stampAnnotation = new PdfStampAnnotation(rectangle);
  stampAnnotation.SetSubject(new PdfString(instanceName));
  stampAnnotation.SetAppearance(PdfName.N, new PdfAnnotationAppearance(qrCodeForm.GetPdfObject()));
  stampAnnotation.SetFlags(PdfAnnotation.PRINT);

  page.AddAnnotation(stampAnnotation);
}

I left out code for brevity.

Thanks in advance.

Update 1

As per request of @mkl two files of which test2_mod is the result. I also added more code. Thank you for your fast response.

Upvotes: 0

Views: 2213

Answers (1)

mkl
mkl

Reputation: 95928

The problem claimed by PDF-XChange Editor Plus here is indeed exactly the one explained in the PDF-XChange forum entry referenced from the forum entry you linked in a comment

Problem is very simple - XRef stream has incorrect /Index entry.

/Index[0 185]/Size 186

File has only one XRef section, which describe all objects in file, so this entry is optional. /Size entry is correct, since document has objects with maximum number 185, but /Index entry say that it has only 185 records instead of 186 required, and XRef stream itself does not contain record about last object (185, XRef stream itself). Normally this does not prevent PDF files to be opened by most readers, but does not conform PDF specification.

(Response by Lzcat - Tracker Supp)

When using cross reference streams, iText 7 also does not add an entry for the cross reference stream itself to the cross references, in case of your example document:

80 0 obj 
  [...] /Index[0 80] [...] /Size 81

The array value of Index here means that this cross reference stream contains 80 entries starting at 0, i.e. entries for object 0..79. But the cross reference stream here is in object 80, so it contains no entry for itself.

But indeed, the PDF specification requires

Like any stream, a cross-reference stream shall be an indirect object. Therefore, an entry for it shall exist in either a cross-reference stream (usually itself) or in a cross-reference table (in hybrid-reference files; see 7.5.8.4, "Compatibility with Applications That Do Not Support Compressed Reference Streams").

(ISO 32000-1, section 7.5.8.3 "Cross-Reference Stream Data")

By the way, the example PDF from that support forum thread has creator and producer entries claiming that it has been created by "Apitron PDF Kit". Apparently iText is not the only library with this quirk.

A quick fix

I tried a quick fix for the issue; as I'm more into Java, I did it for the Java version of iText:

protected void writeXrefTableAndTrailer(PdfDocument document, PdfObject fileId, PdfObject crypto) throws IOException {
    [...]
    if (writer.isFullCompression()) {
        [...]
        xrefStream.put(PdfName.Size, new PdfNumber(this.size()));

        // vvv--- add these two lines
        xrefStream.getIndirectReference().setOffset(startxref);
        sections = createSections(document, false);

        int offsetSize = getOffsetSize(Math.max(startxref, size()));

(kernel module class PdfXrefTable)

It should look very similar in the C# version of iText 7, merely initials of method names are capitalized.

Beware, though, this is merely smoke tested. For production use, much more testing is necessary.

Upvotes: 1

Related Questions