Reputation: 141
I wrote the code below to split the letters in handwriting and in some cases, it is impossible to split properly:
import cv2
import numpy as np
import imutils
from google.colab.patches import cv2_imshow
image = cv2.imread("/content/IMG_3789.JPG")
image = imutils.resize(image, height = 500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.medianBlur(gray, 5)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV,11,8)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=10)
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area > 500:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h, x:x+w]
cv2_imshow(ROI)
break
img_gray = cv2.cvtColor(ROI, cv2.COLOR_BGR2GRAY)
img_gauss = cv2.GaussianBlur(img_gray, (3,3), 0)
kernel = np.ones((4,4), np.uint8)
erode = cv2.erode(img_gauss, kernel, iterations=1)
th3 = cv2.adaptiveThreshold(erode,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,51,10)
im_th2, ctrs, hier = cv2.findContours(th3.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rects = [cv2.boundingRect(ctr) for ctr in ctrs]
rects.sort()
x, y, w, h = rects[0]
cv2.rectangle(ROI, (x, y), (x+w, y+h), (0, 255, 0), 3)
cv2_imshow(ROI)
The first character has two letters:
The fifth too:
Is it possible to identify correctly?
Upvotes: 0
Views: 1402
Reputation:
It seems that the first two letters belong to unconnected blobs (unless your preprocessing makes them touch). So splitting shouldn't be a problem.
For the last two letters, there is no real solution using only "dumb" preprocessing functions. The width is not a reliable criterion, and even if you detected two characters, you don't know exactly where to split.
You have to either design criteria to tell which pieces of the blob are character-like (this is very hard), or to perform partial recognition, possibly with multiple hypothesis and keeping the most likely.
Handwriting segmentation is extremely challenging.
Upvotes: 3