Dushyanth Gokhale
Dushyanth Gokhale

Reputation: 21

Getting error while extracting text from Image with type 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract

trying to extract text from image whose type is 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract. The code and the error is as below

from pdf2image import convert_from_path
pages = convert_from_path('D:/pdf_csv/HealthCare/eRDS - ML/eRDS - ML/2001468/2001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
text = pyt.image_to_string(Image.open(pages[0]), lang='eng')

Error I am getting:

AttributeError: 'PpmImageFile' object has no attribute 'read'

Or Is there any method to convert the PpmImageFile to 'jpg' or 'png' format

Upvotes: 2

Views: 6335

Answers (1)

Belval
Belval

Reputation: 1506

Add fmt='jpeg' or fmt='png' to your function call to get non-PPM images from pdf2image.

In you example, change

pages = convert_from_path('D:/pdf_csv/Health....001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

to

pages = convert_from_path('D:/pdf_csv/Health...001468,69,70.pdf', fmt='jpeg', poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

Upvotes: 4

Related Questions