Reputation: 25
I'm using the PYPDF2 lib to extract texts from a PDF but I'm having a problem doing the loop.
I'm using the following code and I can extract a string from the first page.
from PyPDF2 import PdfFileReader reader = PdfFileReader("mypdf.pdf") # Print number of pages num_page = reader.getNumPages() print(num_page) # Print the number of pages where [0] is the first page page = reader.pages[0] print(page.extractText())
I would like to use the page number that I get with .GetNumPages()
and iterate the number of times over reader.pages[0]
Code that I'm trying to print the 99 pages:
from PyPDF2 import PdfFileReader reader = PdfFileReader("mypdf.pdf")
# Print number of pages num_page = reader.getNumPages() print(num_page)
# Print the number of pages where [0] is the first page
page = reader.pages[0] i = 0 print(type(num_page)) print(type(i)) for i in page:
if i < num_page:
page = reader.pages[i]
print(page.extractText())
i = i + 1
else:
print("done")
Error occurred:
Traceback (most recent call last):
File "/home/wilian/PycharmProjects/ExtractText/pypdf.py", line 13, in <module>
if i < num_page:
TypeError: '<' not supported between instances of 'NameObject' and 'int'
99
<class 'int'>
<class 'int'>
Process finished with exit code 1
Upvotes: 1
Views: 178
Reputation: 12495
Try simple for range loop
Example
from PyPDF2 import PdfFileReader
def pdf_info():
with open("my_pdf.pdf", "rb") as f:
reader = PdfFileReader(f)
for i in range(reader.getNumPages()):
print(i)
# page = reader.pages[i]
# print(page.extractText())
if __name__ == '__main__':
pdf_info()
Upvotes: 1