Reputation: 21
I started with python recently and decided that the best way to learn it is by solving a real problem rather than just following tutorials. I'm trying to write a python program that will help me with with Nonograms using a webcam capturing my smartphone's screen.
I have most of the parts done however, I'm struggling with OCR. Let me show you what I have so far.
I'm using opencv to read the camera (it is held in a 3D printed arm, it has the same distance from phone's screen and pretty much the same lighting conditions). Then I preprocess it (cv2.COLOR_BGR2GRAY, cv2.GaussianBlur, cv2.adaptiveThreshold, etc.) and find contours (cv2.findContours), check if they are closed polygons, have 4 corners and have at least 30% area of the screen (to prevent smaller items to be picked up). Here is the result:
Then I extract the selection, warp it (to get a nice rectangular just in case if the camera/phone were tilted) and use a similar approach to find contours and split the image into 4 sections:
Let's focus on the vertical rules. I process them again horizontally (GaussianBlur > 1000 to get the numbers dissapear but to leave the lines) and vertically, apply threshold and do contour detection to find number of columns and rows (in this example, it is 15x5). Here are the results:
After that I have coordinates for all the squares in a 2-dimensional array. I do the same exercise for the horizontal rules and the play area. Finally, I extract all small squares from the image and store them in an array.
And now, the fun begins: I loop through the extracted images and apply an OCR (so far EasyOCR works better than Tesseract). The result goes into a same-shaped array but as extracted text. After some processing (i.e. only numbers, correct count, totals from both rules add up, etc.) I get 2 arrays like this:
vertical rules: [[2], [1], [1, 1], [3, 2], [8], [6], [1, 4], [1, 1], [1], [1]]
horizontal rules: [[2], [2], [9], [1, 3], [1, 3], [2], [1, 1], [1, 1], [1, 1], [2, 2]]
I pass it to a nonogram solver class that gives me an output in a form of an array that holds 0/1 (only black & white for the moment) representing the solved nonogram. I use it to highlight cells in the original camera view to show the solution:
However, with smaller fonts (like in the original image (15x15 board) or even in some 10x10 boards) the OCR can't handle the text. Depending on the font, most of the time 1 is mixed with 7, 8 with 0, sometimes 4 with 0, etc. I tried various preprocessings to extract edges, blur the number, etc. but the results weren't great.
How can I get the OCR giving me better results? Please see some full-resolution (the camera gives 1920x1080 frame) extracted numbers from the rules section (I will attach more once I get more reputation points :)):
If you could help me finding a better way of extracting text (only numbers are possible, 0..30 is more than enough) from the attached squares above - it would be great!
Upvotes: 1
Views: 3665
Reputation: 21
Thanks @nathancy.
I ended up with the following function that accepts the cut-out square with a single rule, processes it and returns a cleaner version for the OCR:
def clean_square_for_OCR(image):
grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(grey, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Find contours and remove small noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 50:
cv2.drawContours(opening, [c], -1, 0, -1)
# Invert and apply slight Gaussian blur
result = 255 - opening
result = cv2.GaussianBlur(result, (5, 5), 0)
return result
And then, within the main code I had to tweak the Tesseract parameters such a "--psm 8" (treat the image as a single word), possible character set and finally remove the new line if no text was found (blank square):
processed_square = wpi.clean_square_for_OCR(square)
ocr_result = pytesseract.image_to_string(processed_square, lang='eng', config='--psm 8 -c tessedit_char_whitelist=0123456789').replace('\n\x0c', '')
Upvotes: 1