Reputation: 61
I am trying to convert PDF to TEXT. But I have problem in PDFPage class. I have searched about it. But I didn't get anything and it gives me following error. I have also installed pdfminer.six for python 3.5 but still I didn't get any solution. Please help.
Code :
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
import os
import sys, getopt
#converts pdf, returns its text content as a string
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as fh:
for page in PDFPage.get_pages(fh,
caching=True,
check_extractable=True):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle, codec='utf-8', laparams=LAParams())
page_interpreter = PDFPageInterpreter(resource_manager, converter)
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
yield text
# close open handles
converter.close()
fake_file_handle.close()
Error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/system/anaconda3/lib/python3.6/site-packages/pdfminer/pdfpage.py", line 5, in <module>
from .pdftypes import PDFObjectNotFound
ImportError: cannot import name 'PDFObjectNotFound'
Upvotes: 4
Views: 11252
Reputation: 831
Try
pip3 uninstall pdfminer
pip3 uninstall pdfminer-six
pip3 install pdfminer-six
Upvotes: 0
Reputation: 27466
Uninstall pdfminer3k
(if you have it installed)
$ pip uninstall pdfminer3k
and install pdfminer.six
using the command below.
$ python -m pip install pdfminer.six
Upvotes: 6
Reputation: 483
Add the following line in the beginning of your code and give it a shot:
from io import StringIO
Upvotes: 0