Apache PDFBox Merge Error - java.io.IOException: Missing root object specification in trailer

Question

I am trying to merge two existing PDF documents that are InputStreams together using the PDFMergerUtility.mergeDocuments() method in PDFBox. Here's my code; the entry method is pullDocumentsIntoSystem():

private boolean pullDocumentsIntoSystem(final String id, final String filePathAndName, final List parsedLetters)
        throws IOException {

    final List pdfStreams = new ArrayList();
    final ByteArrayOutputStream mergedPdfOutputStream = new ByteArrayOutputStream();

            // make a call to retrieve each document
            for (final Letter letter : parsedLetters) {
                pdfStreams.add(this.getSpecificDocument(letter.getKey(), id));
            }

            // merge all the documents together
            this.mergePdfDocuments(pdfStreams, mergedPdfOutputStream);

            // write file to directory
            this.writeMergedPdfDocument(mergedPdfOutputStream, filePathAndName); //...more code below...

}

private InputStream getSpecificDocument(final String id, final String key) throws IOException {

    HttpURLConnection conn = null;
    InputStream pdfStream = null;

    try {
        final String url = this.getBaseURL() + "/letter/" + id + "/documents/" + key;

        conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestMethod("GET");
        conn.setRequestProperty("X-Letter-Authentication", this.getAuthenticationHeader());
        conn.setRequestProperty("Accept", "application/pdf");
        conn.setRequestProperty("Content-Type", "application/pdf");
        conn.setDoOutput(true);          

        pdfStream = connection.getInputStream();

    }
    finally {
        this.disconnect(connection);
    }

    return pdfStream;
}

    private void mergePdfDocuments(final List pdfStreams, final ByteArrayOutputStream mergedPdfOutputStream)
        throws IOException {

    final PDFMergerUtility merger = new PDFMergerUtility();

    merger.addSources(pdfStreams);

    merger.setDestinationStream(mergedPdfOutputStream);
    merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());  // ERROR THROWN HERE
}

Here's the error I'm receiving on the line with the comment above:

Caused by: java.io.IOException: Missing root object specification in trailer.   
at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2832) ~[pdfbox-2.0.11.jar:2.0.11]     
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173) ~[pdfbox-2.0.11.jar:2.0.11]   
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1144) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1060) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:379) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:280) ~[pdfbox-2.0.11.jar:2.0.11]

I am using PDFBox 2.0.11.

My list of InputStreams are each coming from a separate HttpURLConnection.getInputStream() call in case that matters. I have confirmed that there are indeed documents coming back from the calls being made in the HttpURLConnection.

UPDATE

On the advice of @Tilman Hausherr below, I tested the same functionality without using the InputStreams. If I use PDFMergerUtility.addSource(File source) method instead the PDFMergerUtility.addSource(List) the merge works successfully. So it seems as if something with my InputStreams isn't working correctly.

I appreciate any help and am happy to provide more information if needed.

Thanks for your time!

risingTide · Accepted Answer

In the end this was really a silly mistake. I was closing the HttpURLConnection too early. If I remove the this.disconnect(connection) call at the end of the getSpecificDocument() method then everything works fine.

Well, hopefully this will help someone else.

Thanks for the leads @Фарид Азаев and @Tilman Hausherr!

Apache PDFBox Merge Error - java.io.IOException: Missing root object specification in trailer

UPDATE

Answers (2)

Related Questions