Richard Rublev
Richard Rublev

Reputation: 8160

Python-tesseract does not recognize anything

This is the image that I will import enter image description here

My python code

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open('/home/milenko/Pictures/Screenshot from 2018-03-06 19-03-19.png')))

When I run code

python a72.py 

As an output I got empty line.It does not make any sense. Why?

Upvotes: 0

Views: 1140

Answers (1)

Mateusz Kleinert
Mateusz Kleinert

Reputation: 1376

Try to tweak your command a little bit using e.g.: other Page Segmentation Method As you can see the default value is "Fully automatic page segmentation, but no OSD." so it does not perform orientation and script detection (OSD).

This one gives me some output:

print(pytesseract.image_to_string(Image.open('image.png'), config='-psm 12'))

You can use OpenCV to prepare this image for OCR, e.g:

#!/usr/bin/python

import cv2 as cv
import numpy as np
import pytesseract
import Image

from matplotlib import pyplot as plt

img = cv.imread('/tmp/image.png',0)
ret,thresh = cv.threshold(img, 220, 255, cv.THRESH_BINARY)

plt.axis('off')
plt.imshow(thresh, 'gray')
plt.show()

print(pytesseract.image_to_string(thresh, config='-psm 12'))

In the next step you could divide this image into some parts (x-axis, y-axis, trend line) and use OCR for each part separately with the proper PSM value set for each one of them.

Upvotes: 2

Related Questions