Python-tesseract does not recognize anything

Question

This is the image that I will import

My python code

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open('/home/milenko/Pictures/Screenshot from 2018-03-06 19-03-19.png')))

When I run code

python a72.py

As an output I got empty line.It does not make any sense. Why?

Mateusz Kleinert · Accepted Answer

Try to tweak your command a little bit using e.g.: other Page Segmentation Method As you can see the default value is "Fully automatic page segmentation, but no OSD." so it does not perform orientation and script detection (OSD).

This one gives me some output:

print(pytesseract.image_to_string(Image.open('image.png'), config='-psm 12'))

You can use OpenCV to prepare this image for OCR, e.g:

#!/usr/bin/python

import cv2 as cv
import numpy as np
import pytesseract
import Image

from matplotlib import pyplot as plt

img = cv.imread('/tmp/image.png',0)
ret,thresh = cv.threshold(img, 220, 255, cv.THRESH_BINARY)

plt.axis('off')
plt.imshow(thresh, 'gray')
plt.show()

print(pytesseract.image_to_string(thresh, config='-psm 12'))

In the next step you could divide this image into some parts (x-axis, y-axis, trend line) and use OCR for each part separately with the proper PSM value set for each one of them.

Python-tesseract does not recognize anything

Answers (1)

Related Questions