Guyard
Guyard

Reputation: 119

PDFBox : small pdf file can't be open in PDFormXObject

I have a small pdf file that Pdfbox can't open. After analyze, PDFCloneUtility.cloneMerge() is extremly consumer in memory and calculation,and never finished or exit with JavaHeapSpace although 32Gb ram, but transformation in picture is noit a problem and really fast.

What is special or bad with my PDF ? PDF File

PDDocument pdDocument = PDDocument.load(imported);
new LayerUtility(pdDocument).importPageAsForm(pdDocument, 0);

Upvotes: 1

Views: 221

Answers (1)

mkl
mkl

Reputation: 95928

Indeed importPageAsForm appears not to be fully tested for identical source and target documents.

In the case at hand the PDF has OCGs (optional content groups, in some GUIs also called layers), so the LayerUtility attempts to import OCGs from source to target, i.e. into itself.

Unfortunately this is something the PDFCloneUtility used underneath does not expect and in cloneMerge runs into a never ending loop in

                  COSArray array = (COSArray) base;
                  for (int i = 0; i < array.size(); i++)
                  {
                      ((COSArray) target).add(cloneForNewDocument(array.get(i)));
                  }

where base, array, and target point to the same COSArray.

If one extends the check at the top of cloneMerge

          if( base == null )
          {
              return;
          }

to

          if( base == null || base == target )
          {
              return;
          }

the endless loop is prevented.

One has to check, though, whether this has unwanted side effects.

Upvotes: 2

Related Questions