Reputation: 21
the below code merges the pdf files and returns the combined pdf data. while this code runs, i try to combine the 100 files with each file approximately around 500kb, i get outofmemory error in the line document.close();. this code runs in the web environment, is the memory available to webspehere server is the problem? i read in an article to use freeReader method, but i cannot get how to use it my scenario.
protected ByteArrayOutputStream joinPDFs(List<InputStream> pdfStreams,
boolean paginate) {
Document document = new Document();
ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream();
try {
//List<InputStream> pdfs = pdfStreams;
List<PdfReader> readers = new ArrayList<PdfReader>();
int totalPages = 0;
//Iterator<InputStream> iteratorPDFs = pdfs.iterator();
Iterator<InputStream> iteratorPDFs = pdfStreams.iterator();
// Create Readers for the pdfs.
while (iteratorPDFs.hasNext()) {
InputStream pdf = iteratorPDFs.next();
if (pdf == null)
continue;
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages += pdfReader.getNumberOfPages();
}
//clear this
pdfStreams = null;
//WeakReference ref = new WeakReference(pdfs);
//ref.clear();
// Create a writer for the outputstream
PdfWriter writer = PdfWriter.getInstance(document, mergedPdfStream);
writer.setFullCompression();
document.open();
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
// data
PdfImportedPage page;
int currentPageNumber = 0;
int pageOfCurrentReaderPDF = 0;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Loop through the PDF files and add to the output.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
// Create a new page in the target for each source page.
while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
pageOfCurrentReaderPDF++;
document.setPageSize(pdfReader
.getPageSizeWithRotation(pageOfCurrentReaderPDF));
document.newPage();
// pageOfCurrentReaderPDF++;
currentPageNumber++;
page = writer.getImportedPage(pdfReader,
pageOfCurrentReaderPDF);
cb.addTemplate(page, 0, 0);
// Code for pagination.
if (paginate) {
cb.beginText();
cb.setFontAndSize(bf, 9);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, ""
+ currentPageNumber + " of " + totalPages, 520,
5, 0);
cb.endText();
}
}
pageOfCurrentReaderPDF = 0;
System.out.println("now the size is: "+pdfReader.getFileLength());
}
mergedPdfStream.flush();
document.close();
mergedPdfStream.close();
return mergedPdfStream;
} catch (Exception e) {
e.printStackTrace();
} finally {
if (document.isOpen())
document.close();
try {
if (mergedPdfStream != null)
mergedPdfStream.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
return mergedPdfStream;
}
Thanks V
Upvotes: 2
Views: 3672
Reputation: 1281
This is not proper way of doing file operation. You are doing merging of files using ArrayList
and Array
in memory. You should rather use File IO with buffering techniques.
Do you wish to show the final merged file at last? Then you can open the file after all your merging is done.
byte[]
i mean)Java has limited memory you allocated at startup time, so merging some big number of file at once like this will lead to crashing of application. You should try this merging operation in separate thread using ThreadPool
, so that your application will not get stucked for this.
thanks.
Upvotes: 1
Reputation: 36339
First, why do you clutter your code with all those Iterator<> boilerplate code?
Do you ever heard of the for
statement?
i.e
for (PDfReader pdfReader: readers) {
// code for each single PDF reader in readers
}
Second: consider to close the pdfReader as soon as it is done. This will hopefully flush some buffers and free the memory occupied by the original PDF.
Upvotes: 1
Reputation: 17487
This code merges all the PDF's in an array in the memory (the heap) so yes, memory usage will grow linearly with the number of files merged.
I don't know about the freeReader method, but maybe you could try to write the merged PDF into a temporary file instead of a byte array ? mergedPdfStream
would be a FileOutputStream
instead of a ByteArrayOutputStream
. Then you return e.g. a File
reference to the client code.
Or you could increase the quantity of memory Java can use (-Xmx
JVM parameter), but if the number of files to merge eventually increases, you will find yourself with the same problem.
Upvotes: 3
Reputation: 6018
100 files * 500 kB is something around 50 MB. If maximum heap size is 64 MB I'm pretty sure this code won't work in such conditions.
Upvotes: 0