Arunesh Singh
Arunesh Singh

Reputation: 3535

Image recognition from Computer Screen

I am trying to extract text from the below image. I tried OCR in python. But it is giving me incorrect results.

Test image

I preprocessed the image removed the underline, used canny edge detector increased contrast ratio and then feed it to OCR. Still, I am not getting expected output.

With limited knowledge, I tried to separate characters out of image after increasing contrast.

import cv2
import numpy as np
import os

image_path = os.path.join(os.path.dirname(__file__), "image.png")

im = cv2.imread(image_path)

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)


# converted intermediate pixels to black and white
gray[gray<100] = 0
gray[gray>=100] = 255


gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]
print (np.where(np.all(gray == 255,axis=0)))
print (gray[:,20:33])
words =  np.hsplit(gray, np.where(np.all(gray == 255,axis=0))[0])

i = 0
for word in words:
    word = word[:,~np.all(word == 255, axis=0)]
    if(word.size):
        print (word.shape)
        i = i + 1
        cv2.imwrite("temp" + str(i) + ".png", word)

It became like this

Cropped images

And again I gave this as input to pytesseract. It gave me blank output.

Here are my doubts.

  1. Can we have a better mechanism to separate characters on white-space from image. Currently it seems highly breakable to me.
  2. How can we pre-process image to be better detected by OCR.
  3. Can we use neural-networks or SVM over here like we used for MNIST Digits dataset

Short pointers are ok if it seems too broad. What is the best approach to tackle this kind of problem?

Upvotes: 2

Views: 3577

Answers (1)

Nikolas Rieble
Nikolas Rieble

Reputation: 2601

This answer implements what is said in my comment.

I changed your code a little and refrained form using opencv. The code is written using Python 3.5

To extract the digits, I am summing the image columnwise and scale the resulting array to get check. I am here operating on the gray image that you already cut, effectively getting rid of the underline.

x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)

This array can now be used to compare with a threshold to identify the regions where a letter/digit is located such as:

plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()

enter image description here

Now we will use this information to modify the image and erase the regions where the check array is valued 0 such as:

for idx,i in enumerate((check<8).astype(int)):     
    if i < 1:
        gray[:,idx] = 255

Therefore we have this image:

enter image description here

Which can be further processed just are you are already doing. This provides seperated letters/digits which can then be postprocessed for learning.

The next step that you would work on is scaling/resizing the letters/images to be described by the same amount of features.

Then finally, you can use a pretrained classifier to predict the most probable letter/digits.

The full code is provided here:

import numpy as np
import os
import matplotlib.pyplot as plt
from scipy.stats import mstats
import scipy
from matplotlib import gridspec 
from PIL import Image
image = Image.open("testl.png")
f = image.convert('I')

gray = np.array(f)
gray[gray<200] = 0
gray[gray>=200] = 255

gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]

plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()

plt.matshow(gray)
plt.show()


for idx,i in enumerate((check<8).astype(int)):     
    if i < 1:
        gray[:,idx] = 255

plt.matshow(gray)
plt.show()

words =  np.hsplit(gray, np.where(np.all(gray >= 200,axis=0))[0])


gs = gridspec.GridSpec(1,len(words))
fig = plt.figure(figsize=(len(words),1))

i = 0
for word in words:
    word = word[:,~np.all(word >= 230, axis=0)]
    if(word.size):
        ax = fig.add_subplot(gs[i])
        print (word.shape)
        i = i + 1
        ax.matshow(word, aspect = 'auto')
plt.show()

This finally yields all seperated letters/digits such as:

enter image description here

Upvotes: 1

Related Questions