Reputation: 79615
I'm using Python 3.8.5. I'm trying to write a short script that concatenates PDF files and learning from this Stack Overflow question, I'm trying to use PyPDF2
. Unfortunately, I can't seem to even create a PyPDF2.PdfFileReader
instance without crashing.
My code looks like this:
import pathlib
import PyPDF2
pdf_path = pathlib.Path('1.pdf')
with pdf_path.open('rb') as pdf_file:
reader = PyPDF2.PdfFileReader(pdf_file, strict=False)
When I try to run it, I get the following traceback:
Traceback (most recent call last):
File "C:\...\pdf\open_pdf.py", line 6, in <module>
reader = PyPDF2.PdfFileReader(pdf_file, strict=False)
File "C:\...\.virtualenvs\pdf-j0HnXL2B\lib\site-packages\PyPDF2\pdf.py", line 1084, in __init__
self.read(stream)
File "C:\...\.virtualenvs\pdf-j0HnXL2B\lib\site-packages\PyPDF2\pdf.py", line 1883, in read
stream.seek(-11, 1)
OSError: [Errno 22] Invalid argument
To help reproduce the problem, I created this GitHub repo with the above code and a sample PDF file.
What am I doing wrong?
Upvotes: 6
Views: 3764
Reputation: 1202
You can accomplish this with PyMuPDF (install - at least on Windows - with pip install pymupdf
). The basic pattern for concatenating files is:
import fitz
doc1 = fitz.Document('filename1.pdf')
doc2 = fitz.Document('filename2.pdf')
combined = fitz.Document() # empty document
combined.insertPDF(doc1)
combined.insertPDF(doc2)
combined.save('combinedfile.pdf')
I tested with your file, and it does issue a warning about the invalid cross reference structure in the PDF, but will work. (The file it creates is valid PDF-1.4)
Upvotes: 2
Reputation: 10227
It seems like your 1.pdf
file fails validation, checked here: https://www.pdf-online.com/osa/validate.aspx
I tried with another pdf file of version 1.7 and it worked, so it's not about pdf version, you just have a bad 1.pdf file
Upvotes: 2
Reputation: 778
the code is good but you need to reduce the pdf size its too large to handle one dummy way to do it is to open the pdf file and press print and in the printers selection use Microsoft print pdf and use this file it should not affect the quality of the file
Upvotes: -1