Santosh
Santosh

Reputation: 123

AttributeError: 'PDFPage' object has no attribute 'extractText'

I am trying to extract the content from a PDF in order to create an excel sheet out of it.

What I tried

import pdfquery 
pdf = pdfquery.PDFQuery('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = pdf.get_page(3)
page_content = page.extractText()
print (page_content)

It throws the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-32-d6b615faa422> in <module>() 
      1 page = pdf.get_page(3)
----> 2 page_content = page.extractText()
      3 print (page_content)

AttributeError: 'PDFPage' object has no attribute 'extractText'

Please let me know a possible solution.

Upvotes: 1

Views: 16881

Answers (3)

Tejas Mankar
Tejas Mankar

Reputation: 128

Use PyPDF2 instead of pdfquery

from PyPDF2 import PdfReader

reader = PdfReader('C:\\Users\\Santosh\\Downloads\\2017-San-Jamar-
Price-List-US-Z120913E-RevA.pdf')
page = reader.pages[3]
print(page.extract_text())

Upvotes: 2

Hammad Zafar Bawara
Hammad Zafar Bawara

Reputation: 513

I reinstalled PyPDF2 after uninstalling PyPDF and PyPDF, and the issue was resolved.

pip uninstall PyPDF
pip uninstall PyPDF2
pip install PyPDF2

Upvotes: 0

Berlin Benilo
Berlin Benilo

Reputation: 502

I had also face the same issue. This is due to the non updated version of pypdf2 package installed already with other pdf reader dependencies. By reinstalling pypdf2 is resolved my error.

pip uninstall pypdf2
pip install pypdf2

This worked for me

Upvotes: 1

Related Questions