Reputation: 39
When executing this piece of code:
from pypdf import PdfReader,PdfWriter
import traceback
try:
input_pdf = PdfReader(dwnld_filepath)
output_pdf = PdfWriter()
image = input_pdf.pages[0]
output_pdf.add_page(image)
output_pdf.write(file_path)
except Exception as e:
traceback.print_exc()
This is the complete traceback I see:
Traceback (most recent call last): File "/Users/shafeerali/Documents/Nanonets/avanto/API/test.py", line 58, in output_pdf.add_page(image) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 418, in add_page return self._add_page(page, list.append, excluded_keys) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 331, in add_page page = cast("PageObject", page_org.clone(self, False, excluded_keys)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in _clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 300, in clone obj.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 116, in clone arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 292, in clone obj = self.get_object() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 312, in get_object obj = self.pdf.get_object(self) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_reader.py", line 1401, in get_object retval = read_object(self.stream, self) # type: ignore File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 1280, in read_object return DictionaryObject.read_from_stream(stream, pdf, forced_encoding) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 538, in read_from_stream data["streamdata"] = read_unsized_from_steam(stream, pdf) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 432, in read_unsized_from_steam raise PdfReadError( pypdf.errors.PdfReadError: Unable to find 'endstream' marker for obj starting at 13367.
here the PDF file(s) that cause the issue.
Upvotes: 0
Views: 175
Reputation: 136615
I'm the maintainer of pypdf (and PyPDF2).
The Traceback indicates that your PDF is broken. You can verify that with PDF validators like https://www.pdf-online.com/osa/validate.aspx
Although pypdf can deal with many issues, it will never be able to deal with all kinds of broken PDF documents.
You can repair PDF documents: How can I fix/repair a corrupted PDF file?
Upvotes: 0