Reputation: 123
I am trying to extract the content from a PDF in order to create an excel sheet out of it.
import pdfquery
pdf = pdfquery.PDFQuery('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = pdf.get_page(3)
page_content = page.extractText()
print (page_content)
It throws the following error:
AttributeError Traceback (most recent call last)
<ipython-input-32-d6b615faa422> in <module>()
1 page = pdf.get_page(3)
----> 2 page_content = page.extractText()
3 print (page_content)
AttributeError: 'PDFPage' object has no attribute 'extractText'
Please let me know a possible solution.
Upvotes: 1
Views: 16881
Reputation: 128
Use PyPDF2 instead of pdfquery
from PyPDF2 import PdfReader
reader = PdfReader('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = reader.pages[3]
print(page.extract_text())
Upvotes: 2
Reputation: 513
I reinstalled PyPDF2 after uninstalling PyPDF and PyPDF, and the issue was resolved.
pip uninstall PyPDF
pip uninstall PyPDF2
pip install PyPDF2
Upvotes: 0
Reputation: 502
I had also face the same issue. This is due to the non updated version of pypdf2 package installed already with other pdf reader dependencies. By reinstalling pypdf2 is resolved my error.
pip uninstall pypdf2
pip install pypdf2
This worked for me
Upvotes: 1