Reputation: 214
I would like to import a PDF file and find the most common words.
import PyPDF2
# Open the PDF file and read the text
pdf_file = open("nita20.pdf", "rb")
pdf_reader = PyPDF2.PdfReader(pdf_file)
text = ""
for page in range(pdf_reader.pages):
text += pdf_reader.getPage(page).extractText()
I get this error:
TypeError: '_VirtualList' object cannot be interpreted as an integer
How to resolve this issue? So I can extract every word from the PDF file, thanks.
Upvotes: 0
Views: 318
Reputation: 14920
I got some deprecation warnings on your code, but this works (tested on Python 3.11, PyPDF2 version: 3.0.1)
import PyPDF2
# Open the PDF file and read the text
pdf_file = open("..\test.pdf", "rb")
pdf_reader = PyPDF2.PdfReader(pdf_file)
text = ""
i=0
print(len(pdf_reader.pages))
for page in range(len(pdf_reader.pages)):
text += pdf_reader.pages[i].extract_text()
i=i+1
print(text)
Upvotes: 1