idungotnosn
idungotnosn

Reputation: 2047

PdfCopyForms in ITextSharp causing a Stack Overflow error

In this method I am trying to grab the input fields from one PDF document, paste them onto another document, and print out the result as a pdf file. The result would be a new PDF file which has the input fields of the first PDF and the static content of the second PDF.

I wrote some code that I thought would perform this task, but I ran into a StackOverflow error each time "copier.close()" is executed. This is the error that it throws:

An unhandled exception of type 'System.StackOverflowException' occurred in mscorlib.dll

This is the code:

public static void AddFormFieldsFromSource(string sourcePath, string secondSourcePath, string targetPath) {
  lock (syncLock) {

    PdfReader.unethicalreading = true;

    PdfReader readerMain = new PdfReader(sourcePath);

    FileStream stream = new FileStream(targetPath, FileMode.Create, FileAccess.Write);

    PdfCopyForms copier = new PdfCopyForms(stream);

    PdfReader secondSourceReader = new PdfReader(secondSourcePath);

    copier.AddDocument(secondSourceReader);

    copier.CopyDocumentFields(readerMain);



    copier.Close();
    secondSourceReader.Close();
  }
}

The sourcepath is where I get my input fields from, and my second source path is where I get my static content from.

The PDF I used for the SourcePath variable is located here: https://www.dropbox.com/s/qcc6ug8oohqvmca/primarytwopages2.pdf

The PDF I use for the secondSourcePath variable is located here: https://www.dropbox.com/s/kx2rlhmizh46hl7/secondarytwopages.pdf

Also, on another note, I am using ITextSharp version 5.5.0.

Any idea why it is throwing the StackOverflow error? I don't make any recursive calls in my code. My first guess is that I am trying to do this task incorrectly. The other possibility is that perhaps ITextSharp has a bug.

UPDATE: I downloaded the source code to the LATEST REVISION of ITextSharp (5.5.1), built a dll so I could debug, and then referenced that dll in my code. The stack overflow error appears to occur in the class PdfIndirectReference in this method:

public class PdfIndirectReference : PdfObject {
....
        internal PdfIndirectReference(int type, int number, int generation) : base(0, new StringBuilder().Append(number).Append(' ').Append(generation).Append(" R").ToString()) {
        this.number = number;
        this.generation = generation;
    }

In the call stack of the dll code, I found that it recursively calls a method over and over again in

itextsharp.text.pdf.PdfCopyFieldsImp.Propagate().

This must be why the stack overflow is occurring.

So, it doesn't occur in my code, but rather the dll. Any idea how to get around this?

Upvotes: 0

Views: 1813

Answers (1)

mkl
mkl

Reputation: 95963

I reproduced the issue using iText and Java; the same issue occurs here, so quite likely the cause is the same.

PdfCopyForms internally uses PdfCopyFormsImp which is derived from PdfCopyFieldsImp. This latter class provides the base methods doing the heavy lifting of field and form copying, among them propagate which the OP has found multiple times in the call stack when the stack overflow occurs.

Contrary to the impression left by the observed stack overflow, PdfCopyFieldsImp does have a mechanism to prevent endless loops by marking objects already visited:

/**
 * Sets a reference to "visited" in the copy process.
 * @param   ref the reference that needs to be set to "visited"
 * @return  true if the reference was set to visited
 */
protected boolean setVisited(PRIndirectReference ref) {
    IntHashtable refs = visited.get(ref.getReader());
    if (refs != null)
        return refs.put(ref.getNumber(), 1) != 0;
    else
        return false;
}

This method at the same time marks an object reference from some PdfReader as visited and returns whether or not it has been visited before.

At least it does so for references from all PdfReader instances having an entry in the visited mapping, references from PdfReader instances without such an entry always are claimed to not have been visited yet (return false). Thus, references from those latter readers are not recognized as visited in case of multiple visits!

PdfReader instances get an entry in the visited mapping only in one code location: Only readers added to the copy using addDocument get it.

Using PdfCopyForms to add the form fields from one document to some other PDF, one obviously does not use addDocument for the reader with the form to copy but instead copyDocumentFields. Thus, loop prevention does not work here.

By adding an entry in the visited mapping for the reader from which the form is copied, one can prevent the Stack Overflow. I did it in PdfCopyFormsImp.copyDocumentFields

public void copyDocumentFields(PdfReader reader) throws DocumentException {
    if (!reader.isOpenedWithFullPermissions())
        throw new IllegalArgumentException(MessageLocalization.getComposedMessage("pdfreader.not.opened.with.owner.password"));
    if (readers2intrefs.containsKey(reader)) {
        reader = new PdfReader(reader);
    }
    else {
        if (reader.isTampered())
            throw new DocumentException(MessageLocalization.getComposedMessage("the.document.was.reused"));
        reader.consolidateNamedDestinations();
        reader.setTampered(true);
    }
    reader.shuffleSubsetNames();
    readers2intrefs.put(reader, new IntHashtable());

    visited.put(reader, new IntHashtable()); //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    fields.add(reader.getAcroFields());
    updateCalculationOrder(reader);
}

In iTextSharp the analogous change would be in PdfCopyFormsImp.CopyDocumentFields:

    virtual public void CopyDocumentFields(PdfReader reader) {
        if (!reader.IsOpenedWithFullPermissions)
            throw new BadPasswordException(MessageLocalization.GetComposedMessage("pdfreader.not.opened.with.owner.password"));
        if (readers2intrefs.ContainsKey(reader)) {
            reader = new PdfReader(reader);
        }
        else {
            if (reader.Tampered)
                throw new DocumentException(MessageLocalization.GetComposedMessage("the.document.was.reused"));
            reader.ConsolidateNamedDestinations();
            reader.Tampered = true;
        }
        reader.ShuffleSubsetNames();
        readers2intrefs[reader] = new IntHashtable();

        visited[reader] =  new IntHashtable();  //<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

        fields.Add(reader.AcroFields);
        UpdateCalculationOrder(reader);
    }

Disclaimer: I have not checked whether PdfCopyForms works exactly as required after this change. I merely tested it in Java and only observed that no Stack Overflow occurs anymore and that the resulting PDF in the OP's use case looks ok.

Upvotes: 3

Related Questions