Reputation: 1
I have a Python script which uses PyPDF2 to reverse the order of pages of a PDF.
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
rpage = []
name = input("What's the file called?")
filename = name.split('.', 1)
input1 = PdfFileReader(open(name,'rb'), strict = False)
pages = list(range(1,input1.getNumPages() + 1))
for i in range(0, (input1.getNumPages())):
rpage.append(pages[input1.getNumPages() - i -1])
for i in rpage:
output.addPage(input1.getPage(i-1))
outputpath = filename[0] + '-reversed.pdf'
outputStream = open(outputpath, "wb")
output.write(outputStream)
Which functions as intended up until trying to write the output stream, where it returns this error:
PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\reverse pdf.py", line 22, in <module>
output.write(outputStream)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 611, in readFromStream
data["__streamdata__"] = stream.read(length)
TypeError: integer argument expected, got 'NullObject'
The code does create a PDF file but it has a size of 0KB and is, therefore, unreadable. I have tested a sample script to merge three PDFs found here which produces another empty file and results in this error:
PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1567, in _getObjectFromStream
obj = readObject(streamData, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 98, in readObject
return NumberObject.readFromStream(stream)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 269, in readFromStream
num = utils.readUntilRegex(stream, NumberObject.NumberPattern)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\utils.py", line 134, in readUntilRegex
raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\untitled1.py", line 27, in <module>
merger.write(output)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\merger.py", line 230, in write
self.output.write(fileobj)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 609, in readFromStream
length = pdf.getObject(length)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1593, in getObject
retval = self._getObjectFromStream(indirectReference)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1576, in _getObjectFromStream
raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly
The previous error is also outputted when this script is used to split a PDF into its constituent pages:
from PyPDF2 import PdfFileWriter, PdfFileReader
infile = PdfFileReader(open('test.pdf', 'rb'))
for i in range(infile.getNumPages()):
p = infile.getPage(i)
outfile = PdfFileWriter()
outfile.addPage(p)
with open('page-%02d.pdf' % i, 'wb') as f:
outfile.write(f)
The above code produces (n-1) readable PDFs but with nth PDF is an empty file. Any idea how I can fix this?
Upvotes: 0
Views: 14043
Reputation: 11
Try uninstall and install again the library PyPDF2. It has worked for me!
Upvotes: 0
Reputation: 41
I would recommend that you use, 'merge' functionality of PyPDF2 instead of 'addPage'.
Following code snippets elaborates how you can append and merge files/pages:
from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")
# add the first 3 pages of first file to output
merger.append(fileobj = input1, pages = (0,3))
# insert the first page of second file into the output beginning after the second page
merger.merge(position = 2, fileobj = input2, pages = (0,1))
# Write to an output PDF document
output = open("document-output.pdf", "wb")
merger.write(output)
Remove the 'pages' argument in 'append' and 'merge' functions to merge files instead of specific pages.
Upvotes: 0
Reputation: 8127
If all you want is to be able to reverse the pages for printing, and you don't care about trying to preserve internal links and annotations, pdfrw might be better for the task than pyPDF2:
from pdfrw import PdfWriter, PdfReader
iname = input("What's the file called? ")
oname = iname.rsplit('.', 1)[0] + '-reversed.pdf'
output = PdfWriter()
output.addpages(reversed(PdfReader(iname).pages))
output.write(oname)
Disclaimer: I am the primary pdfrw author.
Upvotes: 1
Reputation: 430
Your script counts through the pages in several different places the purposes of which are not clear to me. I believe how you're counting backwards is the source of your error.
I took your script and first adapted it to 2.7 (since that's what I'm running), then simplified it to walk backward through your source file once, creating your reversed file.
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
# rpage = [] removed because it's not needed anymore
name = raw_input("What's the file called? ") #Changed for the 2.7 environment
filename = name[:-4] #Simplified, since we know where the piece we want is.
input1 = PdfFileReader(name,"rb")
#Simplified, because I couldn't figure out why it was complex.
for i in range(input1.getNumPages(),0,-1):
#getNumPages counts like a human and gives the total number of pages
#This counts backwards, so no need to count forward and use that to
#reverse the numbers.
output.addPage(input1.getPage(i-1))
#getPage counts like a computer and needs to finish with page 0.
outputpath = filename + '-reversed.pdf'
outputStream = open(outputpath, "wb")
output.write(outputStream)
outputStream.close() #Closes the file and stream once you're done.
Upvotes: 1