Reputation: 310
I am simply trying to merge some PDF files using python, more specifically PyPDF2. Easy enough, but for some reason I get an error, which simply do not understand.
While searching for a solution, I found, that other people had this problem as well. However, no for me satisfying solution was posted.
My code for merging the files:
from PyPDF2 import PdfFileMerger
def merge(self, work_files, destination_file):
pdf_merger = PdfFileMerger()
for pdf in work_files:
pdf_merger.append(pdf)
#also tried the following with the same results:
#with open(pdf, 'wb') as fileobj:
#merger.append(fileobj)
with open(destination_file, 'wb') as fileobj:
pdf_merger.write(fileobj)
whereas work_files
is a list of of paths to the pdfs to merge and destination_file
is the file the merged pdf is supposed to be saved.
This produces the following error (full stacktrace provided as requested for):
Traceback (most recent call last):
File "main.py", line 9, in <module>
merger.append(fileobj)
File "/home/user/.local/lib/python3.8/sitepackages/PyPDF2/merger.py",line 203,
in append
self.merge(len(self.pages), fileobj, bookmark, pages,
import_bookmarks)
File "/home/user/.local/lib/python3.8/site-
packages/PyPDF2/merger.py",
line 133, in merge
pdfr = PdfFileReader(fileobj, strict=self.strict)
File "/home/user/.local/lib/python3.8/site-
packages/PyPDF2/pdf.py", line 1084,
in __init__
self.read(stream)
File "/home/user/.local/lib/python3.8/site
packages/PyPDF2/pdf.py", line 1689,
in read
stream.seek(-1, 2)
OSError: [Errno 22] Invalid argument
I have tried different ways of inputting the paths, I have tried relative paths, absolute paths as well as parsing them into another file, without any success.
I am using python 3.8 and working with Linux Ubuntu 20.04.
I would be thankful for any help.
Upvotes: 0
Views: 1265
Reputation: 310
After trying out other ways to merge the PDF files, I embarrassingly realized, that my test file were actually damaged file, which could not even be read by the system - problem solved.
Upvotes: 0
Reputation: 119
If work_files is only a list of paths it means you are only passings strings as input to the append method, one at a time. According to the PdfFileMerger documentation, you need to pass file objects as input to the append method.
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file
Sorry, I overlooked the last part of the documentation but have you actually tried passing file objects? Also maybe try getting your files names with the glob.glob(*.pdf) method. If you could post the full stack trace of the error it would be helpful too.
Upvotes: 0