Reputation: 129
I am trying to convert texts in pdf file to text or HTML format, but this error is occurring frequently 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp' ' How can I remove this ?
I have tried this code in the visual basic studio, but it's still not working , but in that case, I got indentation error due to spaces, so I tried this in the jupyter notebook and got this error.
from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager , process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layput import LAParams
def to_txt(pdf_path):
input_ = file(pdf_path , 'rb')
output = StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams = LAParams())
process_pdf(manager, converter, input_)
return output.getvalue()
b = to_txt(rb"C:\Users\Jasvinder Singh\Desktop\HACK-IN REPORT.docx")
ImportError: cannot import name 'process_pdf' from 'pdfminer.pdfinterp' (C:\Users\Jasvinder Singh\Anaconda3\lib\site-packages\pdfminer\pdfinterp.py)
Upvotes: 2
Views: 9351
Reputation: 1263
Please see the documentation and this comment on a bug.
The process_pdf
method has been replaced by PDFPage.get_pages()
.
Upvotes: 2