Vikram Murthy
Vikram Murthy

Reputation: 323

cv2 contours unable to detect some shapes

I am trying to extract characters from a form for OCR and after experimenting with connected components, MSER and contours, found contours to be the most reliable. The problem though, is that, at times, it fails to detect shapes which are very similar to the ones it has already detected. For instance, in the attached image, "A" in row # 1, col 4 is undetected, while just 2 columns away, it is! Same thing for the "A" in row 3 (col 3 vs col 7). Contoured image( thin green borders)

here's the code i am using to get the above

im = cv2.imread('IMAGES/ACH0.png')
imgray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
imgray = cv2.GaussianBlur(imgray, (5, 5), 0)
(ret, thresh) = cv2.threshold(imgray, 127, 255, cv2.THRESH_BINARY_INV +cv2.THRESH_OTSU
im2, contours, hierarchy = cv2.findContours(thresh,cv2.RETR_LIST ,cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
for ctr in range(len(areas)):
  if areas[ctr] > 10000: continue
  cnt=contours[ ctr ]

  x,y,w,h = cv2.boundingRect(cnt)
  cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),1)

i tried reading up on the inner workings of the cv2 implementation of findContours but couldn't find any resources on it (if i could find it, i could at least debug and understand why this happens). Any pointers would be gratefully acknowledged.

Upvotes: 0

Views: 157

Answers (1)

user1196549
user1196549

Reputation:

Characters that touch the grid cannot be isolated because they belong to a larger blob.

As the grid seems to be well aligned, you can try to locate the grid lines and erase them before performing OCR.

enter image description here

Upvotes: 2

Related Questions