anon
anon

Reputation: 11

How can I separate lines in this text for OCR?

I want to use OCR on this block of text:

enter image description here

It works well on some lines, but on other lines it doesn't detect anything / gibberish. I'm pretty sure it's because of how the text is skewed, since if I alter the angle of the block just slightly, I get better/worse results for certain lines.

Normally I would use contours to deskew the whole block, however, each line has a different skew. So I thought it would be best to separate each line and then deskew and apply OCR for each line independently. I wanted to use Hough transform to detect the horizontals separating the text lines, but it only seems to detect vertical lines.. Do you have any idea how to fix this or maybe do you have an entirely different idea to deskew the image?

Here's the code for the Hough transform:

def hough_lines2(cvImage):
    img = cvImage.copy()
    # since the input image is already pre-processed, I don't have to perform binarization
    edges= cv2.Canny(img,50,150,apertureSize = 3)
    # I invert the edges since I want to detect lines where there is no text
    # i.e. the space between the text lines
    inv = np.invert(edges)
    # I use the parameter MaxLineGap = 1 since I only want to detect lines where there is no
    # text in the way
    linesP = cv2.HoughLinesP(inv,1,np.pi/180,200,None,150,1)
    # Draw the lines
    img2 = cv2.cvtColor(inv, cv2.COLOR_GRAY2BGR)
    if linesP is not None:
        for i in range(0, len(linesP)):
            l = linesP[i][0]
            x1 = l[0]
            y1 = l[1]
            x2 = l[2]
            y2 = l[3]
            cv2.line(img2, (x1, y1), (x2, y2), (0, 255, 0), 2)
    # Display the lines in the image
    cv2.namedWindow('Resized',cv2.WINDOW_NORMAL)
    cv2.resizeWindow('Resized', 600,900)
    cv2.imshow("Resized", imutils.resize(img2, width=500))
    cv2.waitKey(0)
    return 0

And these are the detected lines: enter image description here

Upvotes: 0

Views: 447

Answers (0)

Related Questions