Using opencv to put bounding boxes arounds numbers and words

Question

Currently, I am attempting to write a program that can detect handwritten numbers and mathematical words such as log and sin. However, right now as written my program can only detect individual symbols, so while numbers are being detected just fine, words are detected as separate letters. Attached is my current code.

import cv2
import numpy as np
from PIL import Image, ImageOps


img = cv2.imread("example.JPG")

morph = img.copy()

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 15))

# take morphological gradient
gradient_image = cv2.morphologyEx(morph, cv2.MORPH_GRADIENT, kernel)

gray = cv2.cvtColor(gradient_image, cv2.COLOR_BGR2GRAY)

#take this out?
img_grey = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, kernel)

# blur = cv2.medianBlur(gray,5)

blur = cv2.medianBlur(img_grey,3)


ret, thing = cv2.threshold(blur, 0.0, 255.0, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

img_dilation = cv2.dilate(thing, kernel, iterations=3)

cv2.imwrite("check_equal.jpg", img_dilation)

conturs_lst = cv2.findContours(img_dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2]


coor_lst = []
for cnt in conturs_lst:
    x,y,w,h = cv2.boundingRect(cnt)
    if w < 15 or h < 15:
        continue
    coor_lst.append((x,y,w,h))

How would one go about keeping the behavior the same for numbers but somehow allowing the program to detect that things are words are draw bounding boxes around the entire word?

James Gabriel · Accepted Answer

Your problem:

Currently your program isn't identifying either numbers or words or whatever. It is only detecting contours on a page. If you had a smiley-face on there, it would detect it.

Your options are:

make your program understand what it is actually seeing (compare with known contours of each letter/number, use machine learning, etc) and then parse on that higher level information. Determining the meaning of a hand-written symbol is a canonical machine learning problem and so is beyond a simple SO answer. Resources on this can be found as solutions to the MNIST dataset. One example is that you will feed crops around each of your contours into an [insert ML algorithm] trained on MNIST which will identify them. You will then use some logic to group symbols into words based on [insert heuristic, probably spacing].
find some simple heuristic that does a very good job of separating the contour of numbers/groups of numbers from the contours of letters/words. This will work in very simple circumstances where you can hand-tune everything. Change the hand-writing or style or spacing and this one goes out the window, but it all depends on your project scope

Citations: Years of computer vision research https://en.wikipedia.org/wiki/MNIST_database#Dataset

Using opencv to put bounding boxes arounds numbers and words

Answers (1)

Related Questions