Wbmstrmjb
Wbmstrmjb

Reputation: 105

iText PDFReader Extremely Slow To Open

I have some code that combines a few pages of acro forms (with acrofields in tact) and then at the end writes some JS to the entire document.

It is the PdfReader in the function adding the JS that is taking extremely long to instantiate (about 12 seconds for a 1MB file).

Here is the code (pretty simple):

public static byte[] AddJavascript(byte[] document, string js)
    {
        PdfReader reader = new PdfReader(new RandomAccessFileOrArray(document), null);
        MemoryStream msOutput = new MemoryStream();
        PdfStamper stamper = new PdfStamper(reader, msOutput);
        PdfWriter writer = stamper.Writer;

        writer.AddJavaScript(js);

        stamper.Close();
        reader.Close();

        byte[] withJS = msOutput.GetBuffer();
        return withJS;
    }

I have benchmarked the above and the line that is slow is the first one. I have tried reading it from a file instead of memory and tried using a MemoryStream instead of the RandomAccessFileOrArray. Nothing makes it any faster.

If I add JS to a single page document, it is very fast. So my thought is that the code that combines the pages is somehow making the PDF slow to read for the PdfReader.

Here is the combine code:

public static byte[] CombineFiles(List<byte[]> sourceFiles)
    {
        MemoryStream output = new MemoryStream();

        PdfCopyFields copier = new PdfCopyFields(output);

        try
        {
            output.Position = 0;

            foreach (var fileBytes in sourceFiles)
            {
                PdfReader fileReader = new PdfReader(fileBytes);

                copier.AddDocument(fileReader);
            }
        }
        catch (Exception exception)
        {
            //throw
        }
        finally
        {
            copier.Close();
        }

        byte[] concat = output.GetBuffer();

        return concat;
    }

I am using PdfCopyFields because I need to preserve the form fields and so cannot use the PdfCopy or PdfSmartCopy. This combine code is very fast (few ms) and produces working documents. The AddJS code above is called after it and the PdfReader open is the slow piece.

Any ideas?

Upvotes: 1

Views: 2036

Answers (2)

Chris Haas
Chris Haas

Reputation: 55427

(comment converted to answer)

Using GetBuffer() on a MemoryStream will occasionally create corrupt PDFs. Instead, ToArray() should always be used. More information on this can be found here.

Upvotes: 2

Bruno Lowagie
Bruno Lowagie

Reputation: 77528

As documented, PdfCopyFields is indeed slow. However, PdfCopyFields is either deprecated or about to be deprecated in favor of PdfCopy. There are two examples in the sandbox that show how it's done: MergeForms (copying forms without renaming the fields) and MergeForms2 (copying forms after renaming the fields). This is what MergeForms looks like:

Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
copy.setMergeFields();
document.open();
for (PdfReader reader : readers) {
    copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
    reader.close();
}

Note that you need a recent iText version to run this code.

Upvotes: 1

Related Questions