Kate
Kate

Reputation: 330

Extract text inside rectangle from image

i have an image with multiple red rectangle image extraction and output is good.

i'm using https://github.com/autonise/CRAFT-Remade for text-recognition

original:

Original

my image:

IMAGE

i try to extract text only in all rectangle with pytesserac but without success. output result :

r

2

aseeaaei

ae

How we can extract text from this image correctly with accuracy?

part of code:

def saveResult(img_file, img, boxes, dirname='./result/', verticals=None, texts=None):
        """ save text detection result one by one
        Args:
            img_file (str): image file name
            img (array): raw image context
            boxes (array): array of result file
                Shape: [num_detections, 4] for BB output / [num_detections, 4] for QUAD output
        Return:
            None
        """
        img = np.array(img)

        # make result file list
        filename, file_ext = os.path.splitext(os.path.basename(img_file))

        # result directory
        res_file = dirname + "res_" + filename + '.txt'
        res_img_file = dirname + "res_" + filename + '.jpg'

        if not os.path.isdir(dirname):
            os.mkdir(dirname)

        with open(res_file, 'w') as f:
            for i, box in enumerate(boxes):
                poly = np.array(box).astype(np.int32).reshape((-1))
                strResult = ','.join([str(p) for p in poly]) + '\r\n'
                f.write(strResult)

                poly = poly.reshape(-1, 2)
                cv2.polylines(img, [poly.reshape((-1, 1, 2))], True, color=(0, 0, 255), thickness=2) # HERE
                ptColor = (0, 255, 255)
                if verticals is not None:
                    if verticals[i]:
                        ptColor = (255, 0, 0)

                if texts is not None:
                    font = cv2.FONT_HERSHEY_SIMPLEX
                    font_scale = 0.5
                    cv2.putText(img, "{}".format(texts[i]), (poly[0][0]+1, poly[0][1]+1), font, font_scale, (0, 0, 0), thickness=1)
                    cv2.putText(img, "{}".format(texts[i]), tuple(poly[0]), font, font_scale, (0, 255, 255), thickness=1)

        # Save result image
        cv2.imwrite(res_img_file, img)

after your comment, here's result:

Modified

and tesseract result good for first test but not accuracy :

400
300
200

“2615

1950



24
16

Upvotes: 3

Views: 3523

Answers (1)

nathancy
nathancy

Reputation: 46670

When using Pytesseract to extract text, preprocessing the image is extremely important. In general, we want to preprocess the text such that the desired text to extract is black with the background in white. To do this, we can use Otsu's threshold to obtain a binary image then perform morphological operations to filter and remove noise. Here's a pipeline:

  • Convert image to grayscale and resize
  • Otsu's threshold for binary image
  • Invert image and perform morphological operations
  • Find contours
  • Filter using contour approximation, aspect ratio, and contour area
  • Remove unwanted noise
  • Perform text recognition

After converting to grayscale, we resize the image using imutils.resize() then Otsu's threshold for a binary image. The image is now in only black or white but there is still unwanted noise

From here we invert the image and perform morphological operations with a horizontal kernel. This step merges the text into a single contour where we can filter and remove the unwanted lines and small blobs

Now we find contours and filter using a combination of contour approximation, aspect ratio, and contour area to isolate the unwanted sections. The removed noise is highlighted in green

Now that the noise is removed, we invert the image again to have the desired text in black then perform text extraction. I've also noticed that adding in a slight blur enhances recognition. Here's the cleaned image we perform text extraction on

We give Pytesseract the --psm 6 configuration since we want to treat the image as a uniform block of text. Here's the result from Pytesseract

6745 63 6 10.50
2245 21 18 17
525 4 22 0.18
400 4 a 0.50
300 3 4 0.75
200 2 3 0.22
2575 24 3 0.77
1950 ii 12 133

The output isn't perfect but its close. You can experiment with additional configuration settings here

import cv2
import pytesseract
import imutils

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Invert image and perform morphological operations
inverted = 255 - thresh
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,3))
close = cv2.morphologyEx(inverted, cv2.MORPH_CLOSE, kernel, iterations=1)

# Find contours and filter using aspect ratio and area
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.01 * peri, True)
    x,y,w,h = cv2.boundingRect(approx)
    aspect_ratio = w / float(h)
    if (aspect_ratio >= 2.5 or area < 75):
        cv2.drawContours(thresh, [c], -1, (255,255,255), -1)

# Blur and perform text extraction
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)

cv2.imshow('close', close)
cv2.imshow('thresh', thresh)
cv2.waitKey()

Upvotes: 2

Related Questions