Why is my XREF table corrupt after removing or adding annotations?

Question

I am evaluating iText 7 as alternative for our current PDF processing packages. I am creating a PoC where the input is a PDF document with containers (i.e. annotations). Logic should remove existing QR codes and then add new ones in the containers.

While the logic seems to work and the codes are removed and/or added, there are still issues with the modified file. It states there are errors with the XREF table. The error is introduced also if I only remove or add a code, so I thought the cause might be in the common code.

There seems to be no issue when you open the modified file in the browser. When opening with a advanced PDF editor a warning is shown. Resaving the file with the editor seems to fix the file, but that is not a solution.

I wonder where I went wrong and what to do to fix this or investigate it further. With the code below it is easy to reproduce. Any thoughts on this?

I'm testing with iText 7.1.11 in a .NET Core console application written with C#.

Main code for the process:

using (var pdfFileStream = new MemoryStream(pdfFileBytes))
{
  using (var outputFileStream = new ByteArrayOutputStream())
  {
    var pdfReader = new PdfReader(pdfFileStream);
    var pdfWriter = new PdfWriter(outputFileStream);
    var pdfDocument = new PdfDocument(pdfReader, pdfWriter);

    var pageCount = pdfDocument.GetNumberOfPages();
    PdfPage page = null;
    IList annotations = null;
    for (int i = 0; i < pageCount; i++)
    {
      page = pdfDocument.GetPage(i + 1);
      annotations = page.GetAnnotations();

      if (annotations.Count == 0)
      {
        continue;
      }

      // Get QR code containers
      var containers = FilterAnnotations(annotations, containerName);

      if (containers.Count == 0)
      {
        continue;
      }

      // Get QR codes instances
      var instances = FilterAnnotations(annotations, instanceName);

      if (instances.Count > 0)
      {
         foreach (var instance in instances)
         {
            RemoveQRCode(instance, page);
         }
      }

      foreach (var container in containers)
      {
         AddQRCode(container, page, pdfDocument, url);
      }

    }

    pdfDocument.Close();

    return outputFileStream.ToArray();
  }
}

I filter out the annotations I'd like to remove (i.e. the codes) by subject. The same goes for finding the containers where to put the code annotations:

private List FilterAnnotations(IList annotations, string subject)
{
  return annotations.Where(a =>
    !string.IsNullOrEmpty(GetSubject(a)) &&
    GetSubject(a).ToLowerInvariant().Equals(subject, StringComparison.OrdinalIgnoreCase)).ToList();
}

Then I loop through the filtered annotations to remove them:

private void RemoveQRCode(PdfAnnotation annotation, PdfPage page)
{
  page.RemoveAnnotation(annotation);
}

The following code describes adding the codes:

private void AddQRCode(PdfAnnotation container, PdfPage page, PdfDocument document, string url)
{
  BarcodeQRCode qrCode = new BarcodeQRCode(url);
  PdfFormXObject qrCodeForm = qrCode.CreateFormXObject(ColorConstants.BLACK, document);

  var rectangle = GetRectangle(container);

  var canvas = new PdfCanvas(page);
  canvas
    .SaveState()
    .AddXObject(qrCodeForm, rectangle)
    .RestoreState();

  canvas.Release();

  PdfStampAnnotation stampAnnotation = new PdfStampAnnotation(rectangle);
  stampAnnotation.SetSubject(new PdfString(instanceName));
  stampAnnotation.SetAppearance(PdfName.N, new PdfAnnotationAppearance(qrCodeForm.GetPdfObject()));
  stampAnnotation.SetFlags(PdfAnnotation.PRINT);

  page.AddAnnotation(stampAnnotation);
}

I left out code for brevity.

Thanks in advance.

Update 1

As per request of @mkl two files of which test2_mod is the result. I also added more code. Thank you for your fast response.

Why is my XREF table corrupt after removing or adding annotations?

Answers (1)

A quick fix

Related Questions