A. Attia
A. Attia

Reputation: 1720

Text recognition and detection using TensorFlow

I a working on a text recognition project. I have built a classifier using TensorFlow to predict digits but I would like to implement a more complex algorithm of text recognition by using text localization and text segmentation (separating each character) but I didn't find an implementation for those parts of the algorithms.

So, do you know some algorithms/implementation/tips I, using TensorFlow, to localize text and do text segmentation in natural scenes pictures (actually localize and segmentation of text in the scoreboard for sports pictures)?

Thank you very much for any help.

Upvotes: 0

Views: 2534

Answers (2)

ZKS
ZKS

Reputation: 2866

After you are done with Object Detection, you can perform text detection which can be passed on to tesseract. There can multiple variation to enhance image before passing it to detector function.

Reference Papers https://arxiv.org/abs/1704.03155v2 https://arxiv.org/pdf/2002.07662.pdf

def text_detector(image):
#hasFrame, image = cap.read()
orig = image
(H, W) = image.shape[:2]

(newW, newH) = (640, 320)
rW = W / float(newW)
rH = H / float(newH)

image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]

layerNames = [
    "feature_fusion/Conv_7/Sigmoid",
    "feature_fusion/concat_3"]


blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
    (123.68, 116.78, 103.94), swapRB=True, crop=False)

net.setInput(blob)
(scores, geometry) = net.forward(layerNames)

(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []

for y in range(0, numRows):

    scoresData = scores[0, 0, y]
    xData0 = geometry[0, 0, y]
    xData1 = geometry[0, 1, y]
    xData2 = geometry[0, 2, y]
    xData3 = geometry[0, 3, y]
    anglesData = geometry[0, 4, y]

    # loop over the number of columns
    for x in range(0, numCols):
        # if our score does not have sufficient probability, ignore it
        if scoresData[x] < 0.5:
            continue

        # compute the offset factor as our resulting feature maps will
        # be 4x smaller than the input image
        (offsetX, offsetY) = (x * 4.0, y * 4.0)

        # extract the rotation angle for the prediction and then
        # compute the sin and cosine
        angle = anglesData[x]
        cos = np.cos(angle)
        sin = np.sin(angle)

        # use the geometry volume to derive the width and height of
        # the bounding box
        h = xData0[x] + xData2[x]
        w = xData1[x] + xData3[x]

        # compute both the starting and ending (x, y)-coordinates for
        # the text prediction bounding box
        endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
        endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
        startX = int(endX - w)
        startY = int(endY - h)

        # add the bounding box coordinates and probability score to
        # our respective lists
        rects.append((startX, startY, endX, endY))
        confidences.append(scoresData[x])

boxes = non_max_suppression(np.array(rects), probs=confidences)

for (startX, startY, endX, endY) in boxes:

    startX = int(startX * rW)
    startY = int(startY * rH)
    endX = int(endX * rW)
    endY = int(endY * rH)

    # draw the bounding box on the image
    cv2.rectangle(orig, (startX, startY), (endX, endY), (0, 255, 0), 3)
return orig

Upvotes: 0

Dalen
Dalen

Reputation: 4236

To group elements on a page, like paragraphs of text and images, you can use some clustering algo, and/or blob detection with some tresholds.

You can use Radon transform to recognize lines and detect skew of a scanned page.

I think that for character separation you will have to mess with fonts. Some polynomial matching/fitting or something. (this is a very wild guess for now, don't take it seriously). But similar aproach would allow you to get the character out of the line and recognize it in same step.

As for recognition, once you have a character, there is a nice trigonometric trick of comparing angles of the character to the angles stored in a database. Works great on handwriting too.

I am not an expert on how page segmentation exactly works, but it seems that I am on my way to become one. Just working on a project including it. So give me a month and I'll be able to tell you more. :D

Anyway, you should go and read Tesseract code to see how HP and Google did it there. It should give you pretty good ideas.

Good luck!

Upvotes: 1

Related Questions