CODEWITHSUNDEEP

pythonpython-tesseract

Dushyanth Gokhale

Dushyanth Gokhale

Reputation: 21

Getting error while extracting text from Image with type 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract

trying to extract text from image whose type is 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract. The code and the error is as below

from pdf2image import convert_from_path
pages = convert_from_path('D:/pdf_csv/HealthCare/eRDS - ML/eRDS - ML/2001468/2001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
text = pyt.image_to_string(Image.open(pages[0]), lang='eng')

Error I am getting:

AttributeError: 'PpmImageFile' object has no attribute 'read'

Or Is there any method to convert the PpmImageFile to 'jpg' or 'png' format

Upvotes: 2

Views: 6414

Answers (1)

Reputation: 1516

Add fmt='jpeg' or fmt='png' to your function call to get non-PPM images from pdf2image.

In you example, change

pages = convert_from_path('D:/pdf_csv/Health....001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

to

pages = convert_from_path('D:/pdf_csv/Health...001468,69,70.pdf', fmt='jpeg', poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

Upvotes: 4

Related Questions