simarpreetsingh.019
simarpreetsingh.019

Reputation: 129

How to fix 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp'' error

I am trying to convert texts in pdf file to text or HTML format, but this error is occurring frequently 'cannot import name 'process_pdf' from 'pdfminer.pdfinterp' ' How can I remove this ?

I have tried this code in the visual basic studio, but it's still not working , but in that case, I got indentation error due to spaces, so I tried this in the jupyter notebook and got this error.

from io import StringIO
from pdfminer.pdfinterp import PDFResourceManager , process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layput import LAParams



def to_txt(pdf_path):
    input_ = file(pdf_path , 'rb')
    output = StringIO()

    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams = LAParams())
    process_pdf(manager, converter, input_)

    return output.getvalue()

b = to_txt(rb"C:\Users\Jasvinder Singh\Desktop\HACK-IN REPORT.docx")

ImportError: cannot import name 'process_pdf' from 'pdfminer.pdfinterp' (C:\Users\Jasvinder Singh\Anaconda3\lib\site-packages\pdfminer\pdfinterp.py)

Upvotes: 2

Views: 9351

Answers (1)

vekerdyb
vekerdyb

Reputation: 1263

Please see the documentation and this comment on a bug.

The process_pdf method has been replaced by PDFPage.get_pages().

Upvotes: 2

Related Questions