Reputation: 46433
With these demo files,
test.pdf: "Hello"
tomerge1.pdf: "1"
tomerge2.pdf: "2"
in output.pdf
, I would like to have:
test.pdf
merged with Page 1 of tomerge1.pdf
, i.e. "Hello 1"test.pdf
merged with Page 1 of tomerge2.pdf
, i.e. "Hello 2"Here is what I used:
from PyPDF2 import PdfFileWriter, PdfFileReader
outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))
page = inputpdf.getPage(0)
page.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page)
# exit()
# if we stop here, the output is "Hello 1", which is good
# Why isn't "Hello 1" remembered here?
# del page # doesn't change anything
page = inputpdf.getPage(0)
page.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page)
with open("output.pdf", "wb") as f:
outputpdf.write(f)
Sadly, it doesn't work: instead of having "Hello 1" / "Hello 2", the output is: "Hello 2" / "Hello 2".
Question: how to have the expected behaviour? (without having the size grow very fast when there are 10 or 20 pages)
Upvotes: 0
Views: 914
Reputation: 1190
I found when I was doing a similar exercise that you need to read once and merge once. The way round this is to setup two readers for the input file ("test.pdf") merge from the two readers. Example code below:
addressfile = open("Documents/addresses.pdf","rb")
xwfile = "Downloads/input.pdf"
crosswordfile = open(xwfile,"rb")
xword = PdfFileReader(crosswordfile)
xw2 = PdfFileReader(crosswordfile)
addr = PdfFileReader(addressfile)
xwpage = xword.getPage(0)
addpage1 = addr.getPage(1)
addpage2 = addr.getPage(2)
pdfWriter = PdfFileWriter()
xp2 = xw2.getPage(0)
xwpage.mergePage(addpage1)
xp2.mergePage(addpage2)
res = open("/home/paula/xw.pdf",'wb')
pdfWriter.addPage(xwpage)
pdfWriter.addPage(xp2)
pdfWriter.write(res)
res.close()
crosswordfile.close()
So in your code this be:
testfile = open("test.pdf", "rb")
outputpdf = PdfFileWriter()
inputpdf1 = PdfFileReader(testfile)
inputpdf2 = PdfFileReader(testfile)
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))
page1 = inputpdf1.getPage(0)
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)
# exit()
# No need stop here, the output will have both "Hello 1" and "Hello 2"
# Using two readers for the same file fools PyPdf2 into thinking they
# are two different files, i.e. that we are merging from two sperate sources
page2 = inputpdf2.getPage(0)
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)
with open("output.pdf", "wb") as f:
outputpdf.write(f)
Upvotes: 1