Reputation: 1720
I a working on a text recognition project. I have built a classifier using TensorFlow to predict digits but I would like to implement a more complex algorithm of text recognition by using text localization and text segmentation (separating each character) but I didn't find an implementation for those parts of the algorithms.
So, do you know some algorithms/implementation/tips I, using TensorFlow, to localize text and do text segmentation in natural scenes pictures (actually localize and segmentation of text in the scoreboard for sports pictures)?
Thank you very much for any help.
Upvotes: 0
Views: 2534
Reputation: 2866
After you are done with Object Detection, you can perform text detection which can be passed on to tesseract. There can multiple variation to enhance image before passing it to detector function.
Reference Papers https://arxiv.org/abs/1704.03155v2 https://arxiv.org/pdf/2002.07662.pdf
def text_detector(image):
#hasFrame, image = cap.read()
orig = image
(H, W) = image.shape[:2]
(newW, newH) = (640, 320)
rW = W / float(newW)
rH = H / float(newH)
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]
layerNames = [
"feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"]
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
for y in range(0, numRows):
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
# loop over the number of columns
for x in range(0, numCols):
# if our score does not have sufficient probability, ignore it
if scoresData[x] < 0.5:
continue
# compute the offset factor as our resulting feature maps will
# be 4x smaller than the input image
(offsetX, offsetY) = (x * 4.0, y * 4.0)
# extract the rotation angle for the prediction and then
# compute the sin and cosine
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
# use the geometry volume to derive the width and height of
# the bounding box
h = xData0[x] + xData2[x]
w = xData1[x] + xData3[x]
# compute both the starting and ending (x, y)-coordinates for
# the text prediction bounding box
endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
startX = int(endX - w)
startY = int(endY - h)
# add the bounding box coordinates and probability score to
# our respective lists
rects.append((startX, startY, endX, endY))
confidences.append(scoresData[x])
boxes = non_max_suppression(np.array(rects), probs=confidences)
for (startX, startY, endX, endY) in boxes:
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY * rH)
# draw the bounding box on the image
cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 3)
return orig
Upvotes: 0
Reputation: 4236
To group elements on a page, like paragraphs of text and images, you can use some clustering algo, and/or blob detection with some tresholds.
You can use Radon transform to recognize lines and detect skew of a scanned page.
I think that for character separation you will have to mess with fonts. Some polynomial matching/fitting or something. (this is a very wild guess for now, don't take it seriously). But similar aproach would allow you to get the character out of the line and recognize it in same step.
As for recognition, once you have a character, there is a nice trigonometric trick of comparing angles of the character to the angles stored in a database. Works great on handwriting too.
I am not an expert on how page segmentation exactly works, but it seems that I am on my way to become one. Just working on a project including it. So give me a month and I'll be able to tell you more. :D
Anyway, you should go and read Tesseract code to see how HP and Google did it there. It should give you pretty good ideas.
Good luck!
Upvotes: 1