Reputation: 109
I have created a PDF Splitter using PyPDF2. It splits PDFs that are more than 20Mb in size into multiple smaller PDFs.
The logic I am using is to split all the Pages into single Page PDFs, find each one's size. Add the sizes till 20 Mb is reached and split.
The problem that I am facing is that there are certain pages in a PDF which take almost the same size as the original PDF. Although when I do page extraction manually the size is about 500Kb.
Not sure why is the size increases. Please help me resolve these issues.
for i in range(pdf_reader.numPages):
# New PDF with each page
outputpdf = newpath + '\\' + pp.split('.pdf')[0] + 'page' + str(i+1) +'.pdf'
#PDF Writer
output = PyPDF2.PdfFileWriter()
#Writing each page to PDF Writer
output.addPage(pdf_reader.getPage(i))
#Write into the new PDF
with open(outputpdf, "wb") as outputStream:
output.write(outputStream)
Upvotes: 1
Views: 491
Reputation: 109
After multiple trials and errors, I was able to find an answer. I used pdfrw library to extract each page instead of PyPDF2 and I am not facing the same problem anymore.
Upvotes: 1