Bro From Space
Bro From Space

Reputation: 99

How to transcript text from image in the highlighted areas?

How can I transcript the text from the highlighted areas from the following image with Tesseract in Python?

Input image

Upvotes: 0

Views: 721

Answers (2)

HansHirse
HansHirse

Reputation: 18925

Assuming you have a distinct color for the highlighted areas, which isn't present in the remaining image – like the prominent red color for the highlighting in your example – you can use color thresholding using the HSV color space incorporating cv2.inRange.

Therefore, you set up proper lower and upper limits for hue, saturation, and value. In the given example, we're detecting red-ish colors. So, in general, we would need two sets of limits, since red-ish colors are at the 0°/180° "turnaround" of the hue cylinder. To overcome that, and only use one set of limits, we shift the obtained hue channel by 90°, and take the modulo of 180°. Also, we have high satured, and quite bright red-ish colors, so we might look at saturation levels above 80 %, and value levels above 50 %. We get such a mask:

Mask

Last thing to do is to obtain the contours from the generated mask, get the corresponding bounding rectangles, and run pytesseract on the content (grayscaled, thresholded using Otsu for better OCR performance). My suggestion would be to also use the -psm 6 option here.

Here's the full code including the results:

import cv2
import numpy as np
import pytesseract

# Read image
img = cv2.imread('E5PY2.jpg')

# Convert to HSV color space, and split channels
h, s, v = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))

# Shift hue channel to detect red area using only one range
h_2 = ((h.astype(int) + 90) % 180).astype(h.dtype)

# Mask highlighted boxes using color thresholding
lower = np.array([ 70, int(0.80 * 255), int(0.50 * 255)])
upper = np.array([110, int(1.00 * 255), int(1.00 * 255)])
highlighted = cv2.inRange(cv2.merge([h_2, s, v]), lower, upper)

# Find contours w.r.t. the OpenCV version; retrieve bounding rectangles
cnts = cv2.findContours(highlighted, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rects = [cv2.boundingRect(cnt) for cnt in cnts]

# Iterate bounding boxes, and OCR
for x, y, w, h in rects:

    # Grayscale, and threshold using Otsu
    work = cv2.cvtColor(img[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY)
    work = cv2.threshold(work, 0, 255, cv2.THRESH_OTSU)[1]

    # Pytesseract with -psm 6
    text = pytesseract.image_to_string(work, config='--psm 6')\
        .replace('\n', '').replace('\f', '')
    print('X: {}, Y: {}, Text: {}'.format(x, y, text))
    # X: 468, Y: 1574, Text: START MEDITATING
    # X: 332, Y: 1230, Text: Well done. By signing up, you’ve taken your first
    # X: 358, Y: 182, Text: Welcome

Caveat: I use a special version of Tesseract from the Mannheim University Library.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
NumPy:         1.20.3
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

Upvotes: 1

Natthaphon Hongcharoen
Natthaphon Hongcharoen

Reputation: 2440

From the top to bottom. The boxes are approximately at (x1, y1, x2, y2)

  • 0.2564, 0.1070, 0.6293, 0.166
  • 0.2377, 0.6826, 0.7645, 0.703
  • 0.331, 0.88, 0.6713, 0.913

In relative to width and height. The full code would be like

import cv2
import pytesseract

image = cv2.imread('E5PY2.jpg')
coords = [[0.2564, 0.1070, 0.6293, 0.166],
          [0.2377, 0.6826, 0.7645, 0.703],
          [0.331, 0.88, 0.6713, 0.913]]
h, w, c = image.shape
for idx, (x1, y1, x2, y2) in enumerate(coords):
    x1 = int(x1 * w)
    x2 = int(x2 * w)
    y1 = int(y1 * h)
    y2 = int(y2 * h)
    print(pytesseract.image_to_string(image[y1:y2, x1:x2]))

Upvotes: 0

Related Questions