Reputation: 21

OutOfMemoryError during the pdf merge

the below code merges the pdf files and returns the combined pdf data. while this code runs, i try to combine the 100 files with each file approximately around 500kb, i get outofmemory error in the line document.close();. this code runs in the web environment, is the memory available to webspehere server is the problem? i read in an article to use freeReader method, but i cannot get how to use it my scenario.

protected ByteArrayOutputStream joinPDFs(List<InputStream> pdfStreams,
        boolean paginate) {

    Document document = new Document();

    ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream();

    try {
        //List<InputStream> pdfs = pdfStreams;
        List<PdfReader> readers = new ArrayList<PdfReader>();
        int totalPages = 0;
        //Iterator<InputStream> iteratorPDFs = pdfs.iterator();
        Iterator<InputStream> iteratorPDFs = pdfStreams.iterator();

        // Create Readers for the pdfs.
        while (iteratorPDFs.hasNext()) {
            InputStream pdf = iteratorPDFs.next();
            if (pdf == null)
                continue;
            PdfReader pdfReader = new PdfReader(pdf);
            readers.add(pdfReader);
            totalPages += pdfReader.getNumberOfPages();
        }

        //clear this
        pdfStreams = null;

        //WeakReference ref = new WeakReference(pdfs);
        //ref.clear();

        // Create a writer for the outputstream
        PdfWriter writer = PdfWriter.getInstance(document, mergedPdfStream);
        writer.setFullCompression();

        document.open();
        BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
                BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
        PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
        // data

        PdfImportedPage page;
        int currentPageNumber = 0;
        int pageOfCurrentReaderPDF = 0;
        Iterator<PdfReader> iteratorPDFReader = readers.iterator();

        // Loop through the PDF files and add to the output.
        while (iteratorPDFReader.hasNext()) {
            PdfReader pdfReader = iteratorPDFReader.next();

            // Create a new page in the target for each source page.
            while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
                pageOfCurrentReaderPDF++;
                document.setPageSize(pdfReader
                        .getPageSizeWithRotation(pageOfCurrentReaderPDF));
                document.newPage();
                // pageOfCurrentReaderPDF++;
                currentPageNumber++;
                page = writer.getImportedPage(pdfReader,
                        pageOfCurrentReaderPDF);
                cb.addTemplate(page, 0, 0);

                // Code for pagination.
                if (paginate) {
                    cb.beginText();
                    cb.setFontAndSize(bf, 9);
                    cb.showTextAligned(PdfContentByte.ALIGN_CENTER, ""
                            + currentPageNumber + " of " + totalPages, 520,
                            5, 0);
                    cb.endText();
                }
            }
            pageOfCurrentReaderPDF = 0;
            System.out.println("now the size is: "+pdfReader.getFileLength());
        }
        mergedPdfStream.flush();
        document.close();
        mergedPdfStream.close();
        return mergedPdfStream;
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (document.isOpen())
            document.close();
        try {
            if (mergedPdfStream != null)
                mergedPdfStream.close();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }
    return mergedPdfStream;
}

Thanks V

Upvotes: 2

Answers (4)

Parth

Reputation: 1281

This is not proper way of doing file operation. You are doing merging of files using ArrayList and Array in memory. You should rather use File IO with buffering techniques.

Do you wish to show the final merged file at last? Then you can open the file after all your merging is done.

Do not use only in-memory buffering as you have shown. Use File Io with buffering (byte[] i mean)
Close each file after you read it and append it.

Java has limited memory you allocated at startup time, so merging some big number of file at once like this will lead to crashing of application. You should try this merging operation in separate thread using ThreadPool, so that your application will not get stucked for this.

thanks.

Upvotes: 1

Ingo

Reputation: 36339

First, why do you clutter your code with all those Iterator<> boilerplate code? Do you ever heard of the for statement? i.e

for (PDfReader pdfReader: readers) { 
      // code for each single PDF reader in readers
}

Second: consider to close the pdfReader as soon as it is done. This will hopefully flush some buffers and free the memory occupied by the original PDF.

Upvotes: 1

Pierre Henry

Reputation: 17487

This code merges all the PDF's in an array in the memory (the heap) so yes, memory usage will grow linearly with the number of files merged.

I don't know about the freeReader method, but maybe you could try to write the merged PDF into a temporary file instead of a byte array ? mergedPdfStream would be a FileOutputStream instead of a ByteArrayOutputStream. Then you return e.g. a File reference to the client code.

Or you could increase the quantity of memory Java can use (-Xmx JVM parameter), but if the number of files to merge eventually increases, you will find yourself with the same problem.

Upvotes: 3

hudolejev

Reputation: 6018

100 files * 500 kB is something around 50 MB. If maximum heap size is 64 MB I'm pretty sure this code won't work in such conditions.

Upvotes: 0

OutOfMemoryError during the pdf merge

Answers (4)

Related Questions