Yohan D
Yohan D

Reputation: 970

Improve a picture to detect the characters within an area

My goal is to detect the characters on images of this kind. Input Image

I need to improve the image so that Tesseract does a better recognition, probably by doing the following steps:

Here is the output image I get which is not perfect for the OCR:output

OCR problems:

Any other suggestions to improve the recognition are welcome

Upvotes: 3

Views: 2368

Answers (4)

jcupitt
jcupitt

Reputation: 11190

Here's a slightly different approach using pyvips.

If the image is just rotated (ie. little or no perspective), you can take the FFT to find the angle of rotation. The nice, regular grid of characters will produce a clear set of lines on the transform. It should be very robust. This is doing the FFT on the entire image, but you could shrink it a bit first if you want more speed.

import sys
import pyvips

image = pyvips.Image.new_from_file(sys.argv[1])

# to monochrome, take the fft, wrap the origin to the centre, get magnitude
fft = image.colourspace('b-w').fwfft().wrap().abs()

Making:

enter image description here

To find the angle of the lines, turn from polar to rectangular coordinates and look for horizontals:

def to_rectangular(image):
    xy = pyvips.Image.xyz(image.width, image.height)
    xy *= [1, 360.0 / image.height]
    index = xy.rect()
    scale = min(image.width, image.height) / float(image.width)
    index *= scale / 2.0
    index += [image.width / 2.0, image.height / 2.0]
    return image.mapim(index)

# sum of columns, sum of rows
cols, rows = to_rectangular(fft).project()

Making:

enter image description here

With a projection of:

enter image description here

Then just look for the peak and rotate:

# blur the rows projection a bit, then get the maxpos
v, x, y = rows.gaussblur(10).maxpos()

# and turn to an angle in degrees we should counter-rotate by
angle = 270 - 360 * y / rows.height

image = image.rotate(angle)

enter image description here

To crop, I took the horizontal and vertical projections again, then searched for peaks with B > G.

cols, rows = image.project() 

h = (cols[2] - cols[1]) > 10000
v = (rows[2] - rows[1]) > 10000

# search in from the edges for the first non-zero value
cols, rows = h.profile()
left = rows.avg()

cols, rows = h.fliphor().profile()
right = h.width - rows.avg()
width = right - left

cols, rows = v.profile()
top = cols.avg()

cols, rows = v.flipver().profile()
bottom = v.height - cols.avg()
height = bottom - top

# move the crop in by a margin
margin = 10
left += margin
top += margin
width -= 2 * margin
height -= 2 * margin

# and crop!
image = image.crop(left, top, width, height)

To make:

enter image description here

And finally to remove the background, blur with a large radius and subtract:

image = image.colourspace('b-w').gaussblur(70) - image

To make:

enter image description here

Upvotes: 2

Kinght 金
Kinght 金

Reputation: 18331

Here are my steps to recognize the chars:

(1) detect the blue in hsv space, approx the inner blur contour and sort the corner points:
(2) find persprctive transform matrix and do perspective transform
(3) threshold it (and find characters)
(4) use `mnist` algorithms to recognize the chars

step (1) find the corners of the blur rect

Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)

enter image description here enter image description here

step (2) crop

enter image description here

step (3) threshold (and find the chars)

enter image description here enter image description here

step (4) on working...

Upvotes: 1

Mark Setchell
Mark Setchell

Reputation: 207465

Here's one idea for a way to proceed...

Convert to HSV, then start in each corner and progress towards the middle of the picture looking for the nearest pixel to each corner that is somewhat saturated and has a hue matching your blueish surrounding rectangle. That will give you the 4 points marked in red:

enter image description here

Now use a perspective transform to shift each of those points to the corner to make the image rectilinear. I used ImageMagick but you should be able to see that I translate the top-left red dot at coordinates (210,51) into the top-left of the new image at (0,0). Likewise, the top-right red dot at (1754,19) gets shifted to (2064,0). The ImageMagick command in Terminal is:

convert wordsearch.jpg \
  -distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg

That results in this:

enter image description here

The next issue is uneven lighting - namely the bottom-left is darker than the rest of the image. To offset this, I clone the image and blur it to remove high frequencies (just a box-blur, or box-average is fine) so it now represents the slowly varying illumination. I then subtract the image from this so I am effectively removing background variations and leaving only high-frequency things - like your letters. I then normalize the result to make whites white and blacks black and threshold at 50%.

convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
   -compose difference -composite  -negate -normalize -threshold 50% final.jpg

enter image description here

The result should be good for template matching if you know the font and letters or for OCR if you don't.

Upvotes: 2

ARR
ARR

Reputation: 2308

I think it's better to remove to color instead of cropping.

It could be done with opencv see: python - opencv morphologyEx remove specific color

Upvotes: 0

Related Questions