Saksham Rustagi
Saksham Rustagi

Reputation: 29

Using pytesseract to get text from an image

I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

The original image looks like this:

Original

The resulting image looks like this:

Processed

I do this for a bunch of numbers with the same font in the same location here are the results:

Results Python

It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.

Here is a snippet of the code I'm using:

def checkCurrentState():
    """image = pyautogui.screenshot()
    image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
    cv2.imwrite("screenshot.png", image)"""

    image = cv2.imread("screenshot.png")

    checkNumbers(image)



def checkNumbers(image):
    numbers = []
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.bitwise_not(gray)
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]


    for i in storeLocations:
        cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
        number = pytesseract.image_to_string(cropped)
        numbers.append(number)
        print(number)
        cv2.imshow("Screenshot", cropped)
        cv2.waitKey(0)

Upvotes: 2

Views: 2253

Answers (1)

nathancy
nathancy

Reputation: 46600

To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.

To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.


Here's a visualization of each step:

Input image

enter image description here

Convert to grayscale -> Gaussian blur

enter image description here

Otsu's threshold -> Morph open to remove noise

enter image description here

Result from Pytesseract OCR

1100

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)

# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
print(data)

cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.waitKey()

Upvotes: 2

Related Questions