Reputation: 3535
I am trying to extract text from the below image. I tried OCR in python. But it is giving me incorrect results.
I preprocessed the image removed the underline, used canny edge detector increased contrast ratio and then feed it to OCR. Still, I am not getting expected output.
With limited knowledge, I tried to separate characters out of image after increasing contrast.
import cv2
import numpy as np
import os
image_path = os.path.join(os.path.dirname(__file__), "image.png")
im = cv2.imread(image_path)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
# converted intermediate pixels to black and white
gray[gray<100] = 0
gray[gray>=100] = 255
gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]
print (np.where(np.all(gray == 255,axis=0)))
print (gray[:,20:33])
words = np.hsplit(gray, np.where(np.all(gray == 255,axis=0))[0])
i = 0
for word in words:
word = word[:,~np.all(word == 255, axis=0)]
if(word.size):
print (word.shape)
i = i + 1
cv2.imwrite("temp" + str(i) + ".png", word)
It became like this
And again I gave this as input to pytesseract. It gave me blank output.
Here are my doubts.
Short pointers are ok if it seems too broad. What is the best approach to tackle this kind of problem?
Upvotes: 2
Views: 3577
Reputation: 2601
This answer implements what is said in my comment.
I changed your code a little and refrained form using opencv. The code is written using Python 3.5
To extract the digits, I am summing the image columnwise and scale the resulting array to get check
. I am here operating on the gray
image that you already cut, effectively getting rid of the underline.
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
This array can now be used to compare with a threshold to identify the regions where a letter/digit is located such as:
plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()
Now we will use this information to modify the image and erase the regions where the check array is valued 0 such as:
for idx,i in enumerate((check<8).astype(int)):
if i < 1:
gray[:,idx] = 255
Therefore we have this image:
Which can be further processed just are you are already doing. This provides seperated letters/digits which can then be postprocessed for learning.
The next step that you would work on is scaling/resizing the letters/images to be described by the same amount of features.
Then finally, you can use a pretrained classifier to predict the most probable letter/digits.
The full code is provided here:
import numpy as np
import os
import matplotlib.pyplot as plt
from scipy.stats import mstats
import scipy
from matplotlib import gridspec
from PIL import Image
image = Image.open("testl.png")
f = image.convert('I')
gray = np.array(f)
gray[gray<200] = 0
gray[gray>=200] = 255
gray = gray[~np.all(gray == 255, axis=1)]
gray = gray[:,~np.all(gray == 255, axis=0)]
gray = gray[~np.all(gray == 0, axis=1)]
plt.imshow(gray, cmap='gray')
x_sum = np.sum(gray, axis = 0)
check = ((x_sum)/np.max(x_sum)*10)
plt.plot((check<8).astype(int))
plt.show()
plt.matshow(gray)
plt.show()
for idx,i in enumerate((check<8).astype(int)):
if i < 1:
gray[:,idx] = 255
plt.matshow(gray)
plt.show()
words = np.hsplit(gray, np.where(np.all(gray >= 200,axis=0))[0])
gs = gridspec.GridSpec(1,len(words))
fig = plt.figure(figsize=(len(words),1))
i = 0
for word in words:
word = word[:,~np.all(word >= 230, axis=0)]
if(word.size):
ax = fig.add_subplot(gs[i])
print (word.shape)
i = i + 1
ax.matshow(word, aspect = 'auto')
plt.show()
This finally yields all seperated letters/digits such as:
Upvotes: 1