Basj
Basj

Reputation: 46433

Merging PDF pages failing with pypdf2

With these demo files,

test.pdf: "Hello"
tomerge1.pdf: "1"
tomerge2.pdf: "2"

in output.pdf, I would like to have:

Here is what I used:

from PyPDF2 import PdfFileWriter, PdfFileReader

outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page = inputpdf.getPage(0)
page.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page)

# exit()
# if we stop here, the output is "Hello 1", which is good
# Why isn't "Hello 1" remembered here?
# del page    # doesn't change anything

page = inputpdf.getPage(0)
page.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

Sadly, it doesn't work: instead of having "Hello 1" / "Hello 2", the output is: "Hello 2" / "Hello 2".

Question: how to have the expected behaviour? (without having the size grow very fast when there are 10 or 20 pages)

Upvotes: 0

Views: 914

Answers (1)

Paula Thomas
Paula Thomas

Reputation: 1190

I found when I was doing a similar exercise that you need to read once and merge once. The way round this is to setup two readers for the input file ("test.pdf") merge from the two readers. Example code below:

addressfile = open("Documents/addresses.pdf","rb")
xwfile = "Downloads/input.pdf"
crosswordfile = open(xwfile,"rb")
xword = PdfFileReader(crosswordfile)
xw2 = PdfFileReader(crosswordfile)
addr = PdfFileReader(addressfile)
xwpage = xword.getPage(0)
addpage1 = addr.getPage(1)
addpage2 = addr.getPage(2)
pdfWriter = PdfFileWriter()
xp2 = xw2.getPage(0)
xwpage.mergePage(addpage1)
xp2.mergePage(addpage2)
res = open("/home/paula/xw.pdf",'wb')
pdfWriter.addPage(xwpage)
pdfWriter.addPage(xp2)
pdfWriter.write(res)
res.close()
crosswordfile.close()

So in your code this be:

testfile = open("test.pdf", "rb")
outputpdf = PdfFileWriter()
inputpdf1 = PdfFileReader(testfile)
inputpdf2 = PdfFileReader(testfile)
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page1 = inputpdf1.getPage(0)
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)

# exit()
# No need stop here, the output will have both "Hello 1" and "Hello 2"
# Using two readers for the same file fools PyPdf2 into thinking they 
# are two different files, i.e. that we are merging from two sperate sources

page2 = inputpdf2.getPage(0)
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

Upvotes: 1

Related Questions