Reputation: 970
My goal is to detect the characters on images of this kind.
I need to improve the image so that Tesseract does a better recognition, probably by doing the following steps:
Use Tesseract to detect the characters
img = Image.open('grid.jpg')
image = np.array(img.convert("RGB"))[:, :, ::-1].copy()
# Need to rotate the image here and fill the blanks
# Need to crop the image here
# Gray the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Otsu's thresholding
ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Gaussian Blur
blur = cv2.GaussianBlur(th3, (5, 5), 0)
# Save the image
cv2.imwrite("preproccessed.jpg", blur)
# Apply the OCR
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'
preprocessed = Image.open('preproccessed.jpg')
boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)
Here is the output image I get which is not perfect for the OCR:
OCR problems:
Any other suggestions to improve the recognition are welcome
Upvotes: 3
Views: 2368
Reputation: 11190
Here's a slightly different approach using pyvips.
If the image is just rotated (ie. little or no perspective), you can take the FFT to find the angle of rotation. The nice, regular grid of characters will produce a clear set of lines on the transform. It should be very robust. This is doing the FFT on the entire image, but you could shrink it a bit first if you want more speed.
import sys
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1])
# to monochrome, take the fft, wrap the origin to the centre, get magnitude
fft = image.colourspace('b-w').fwfft().wrap().abs()
Making:
To find the angle of the lines, turn from polar to rectangular coordinates and look for horizontals:
def to_rectangular(image):
xy = pyvips.Image.xyz(image.width, image.height)
xy *= [1, 360.0 / image.height]
index = xy.rect()
scale = min(image.width, image.height) / float(image.width)
index *= scale / 2.0
index += [image.width / 2.0, image.height / 2.0]
return image.mapim(index)
# sum of columns, sum of rows
cols, rows = to_rectangular(fft).project()
Making:
With a projection of:
Then just look for the peak and rotate:
# blur the rows projection a bit, then get the maxpos
v, x, y = rows.gaussblur(10).maxpos()
# and turn to an angle in degrees we should counter-rotate by
angle = 270 - 360 * y / rows.height
image = image.rotate(angle)
To crop, I took the horizontal and vertical projections again, then searched for peaks with B > G.
cols, rows = image.project()
h = (cols[2] - cols[1]) > 10000
v = (rows[2] - rows[1]) > 10000
# search in from the edges for the first non-zero value
cols, rows = h.profile()
left = rows.avg()
cols, rows = h.fliphor().profile()
right = h.width - rows.avg()
width = right - left
cols, rows = v.profile()
top = cols.avg()
cols, rows = v.flipver().profile()
bottom = v.height - cols.avg()
height = bottom - top
# move the crop in by a margin
margin = 10
left += margin
top += margin
width -= 2 * margin
height -= 2 * margin
# and crop!
image = image.crop(left, top, width, height)
To make:
And finally to remove the background, blur with a large radius and subtract:
image = image.colourspace('b-w').gaussblur(70) - image
To make:
Upvotes: 2
Reputation: 18331
Here are my steps to recognize the chars:
(1) detect the blue in hsv space, approx the inner blur contour and sort the corner points:
(2) find persprctive transform matrix and do perspective transform
(3) threshold it (and find characters)
(4) use `mnist` algorithms to recognize the chars
step (1) find the corners of the blur rect
Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)
step (2) crop
step (3) threshold (and find the chars)
step (4) on working...
Upvotes: 1
Reputation: 207465
Here's one idea for a way to proceed...
Convert to HSV, then start in each corner and progress towards the middle of the picture looking for the nearest pixel to each corner that is somewhat saturated and has a hue matching your blueish surrounding rectangle. That will give you the 4 points marked in red:
Now use a perspective transform to shift each of those points to the corner to make the image rectilinear. I used ImageMagick but you should be able to see that I translate the top-left red dot at coordinates (210,51) into the top-left of the new image at (0,0). Likewise, the top-right red dot at (1754,19) gets shifted to (2064,0). The ImageMagick command in Terminal is:
convert wordsearch.jpg \
-distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg
That results in this:
The next issue is uneven lighting - namely the bottom-left is darker than the rest of the image. To offset this, I clone the image and blur it to remove high frequencies (just a box-blur, or box-average is fine) so it now represents the slowly varying illumination. I then subtract the image from this so I am effectively removing background variations and leaving only high-frequency things - like your letters. I then normalize the result to make whites white and blacks black and threshold at 50%.
convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
-compose difference -composite -negate -normalize -threshold 50% final.jpg
The result should be good for template matching if you know the font and letters or for OCR if you don't.
Upvotes: 2
Reputation: 2308
I think it's better to remove to color instead of cropping.
It could be done with opencv see: python - opencv morphologyEx remove specific color
Upvotes: 0