Martin Truchon
Martin Truchon

Reputation: 51

Export and merge annotations from a PDF to another PDF

I'm looking for a way to export the annotation layer of a PDF and merge it back in another PDF. I've tried using libraries like poppler and PyPDF2 but nothing worked so far. Are there any open-source libraries that could do what I want?

Upvotes: 5

Views: 1160

Answers (2)

K J
K J

Reputation: 11722

The request asks for an open-source method to

export the annotation layer of a PDF and merge it back in another PDF. Are there any open-source libraries that could do what I want?

Thus this is a simple single command line request, using any library that transfers annotations, via a single command. There are several "Open Source" offerings that could be recommended for this task. Note cpdf is open source but not free for commercial use, but then it's one of the best in class.

Here I simply take the annotations intentionally from page 1 into a new Blank A4 portrait page, but can be any page array of page annotations of any PDF into any range of any other PDF.

NOTE the dimensions are exactly the same even if the value for inverted Y looks naturally different.

cpdf -create-pdf -create-pdf-papersize a4portrait -o out.pdf AND -copy-annotations rtl1234.pdf out.pdf  -o newout.pdf

enter image description here

Upvotes: 0

Joris Schellekens
Joris Schellekens

Reputation: 9012

Disclaimer: I am the author of borb the library used in this example.

borb converts a PDF document to an internal JSON-like representation of nested lists, dictionaries and primitives. That means your question comes down to copying a dictionary from one JSON object to another. Should be pretty easy.

You would need to read the first document:

doc_in_a = None
with open("input_a.pdf", "rb") as in_file_handle:
    doc_in_a = PDF.loads(in_file_handle)

Then you would need to read the second document:

doc_in_b = None
with open("input_b.pdf", "rb") as in_file_handle:
    doc_in_b = PDF.loads(in_file_handle)

And then add all annotations from a to b:

annots = doc_in_a.get_page(0).get_annotations()
doc_in_b.get_page(0)[Name("Annots")] = List()
for a in annots:
    doc_in_b.get_page(0)["Annots"].append(a)

Finally, write pdf b:

with open("output.pdf", "wb") as out_file_handle:
    PDF.dumps(out_file_handle, doc_in_b)

You can obtain pText either on GitHub, or using PyPi There are a ton more examples, check them out to find out more about working with images.

Upvotes: 1

Related Questions