Michael
Michael

Reputation: 1397

PDF specification for appending

I am writing some code that needs to be able to take two pdfs and append them on the page level (for example, if they were both 2 page documents, have one 4 page document where all 4 pages are identical to the original).

Without using a library what's the best way to do this? Does the PDF specification make this easy?

Upvotes: 3

Views: 239

Answers (1)

Vel Genov
Vel Genov

Reputation: 11091

As others have already mentioned, merging two PDF files together will be a large undertaking, if you don't use a PDF library. You will need a solid understanding of the internal PDF structure. Here is a link to the PDF specification. It's a good place to get started - PDF Reference.

Before I go into detail, here is a little experiment in merging two very simple PDF files, and the result. The two files are 34kb each. The resulting file was 35kb, and it contained the pages of each of the input files. That alone shows that there is more going on under the hood than merging the code for the two input documents. Comparing the code for the input and output documents, also showed that they have been completely re-created, with different object IDs for each object.

A usual PDF document contains a header, body, cross-reference table and a trailer. When a PDF document is read, the library starts from the top, and then jumps to the end of the document, moving back, until it hits the cross-reference table. In this table, the library looks for the objects, and byte offsets in a particular document. This table is updated, or re-created when new objects are added to the document.

To merge two documents manually, you will have to move the objects from the body of the second document into the first document. Then you can update the metadata of the first document if necessary. The difficult task here is updating, and possibly re-creating the cross-reference table. You will need to implement a significant portion of the PDF spec to be able to do that.

If you decide to use a library in your project, there are some fairly lightweight libraries out there that will do the trick. The PDFtk library is fairly lightweight, and can do PDF merging with 1 command. It has a free version, as well as command line capabilities. You should be able to set up a simple server to host it in your environment, and then call it via Java Script.

In case your project requires more than a free library, then there is APDFL, which is a commercial PDF processing library. It has a .NET or a Java interface, so you can easily create a server app that will merge PDF files for you.

Upvotes: 2

Related Questions