Tech Five
Tech Five

Reputation: 95

Remove false text regions from an image

I'm working on a project to detect text in images. So far I have been able to isolate candidate text regions. I used some threshold values for aspect ratio, contour area and white pixel count inside the bounding box of a counter to remove non text regions. But I cannot give too smaller thresholds for these parameters as there are images with small font sizes. Still there are some non text regions present. I read that Stroke Width Transform is a solution for this problem but it is to complicated. Is there any other method to remove these non text regions? I thought of using the curve shape of text to distinguish the regions but couldn't think of a way to implement it.

This is a sample image

enter image description here

Identified regions

enter image description here

Upvotes: 3

Views: 868

Answers (1)

nathancy
nathancy

Reputation: 46650

You can use simple contour area filtering to remove the noise. The idea is to find contours, filter using cv2.contourArea(), and draw the valid contours onto a blank mask. To reconstruct the image without the noise, we bitwise-and the input image with the mask to get our result.

Noise to remove highlighted in green

enter image description here

Result

enter image description here

Code

import cv2
import numpy as np 

# Load image, create blank mask, grayscale, Otsu's threshold
image = cv2.imread('1.png')
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Find contours and filter using contour area
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area > 250:
        cv2.drawContours(mask, [c], -1, (255,255,255), -1)

# Bitwise and to reconstruct image
result = cv2.bitwise_and(image, mask)

cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey()

Note: If you know that the text will be yellow, another approach would be to use color thresholding to isolate the text. You can use this HSV color thresholder script to determine the lower/upper bounds

Upvotes: 2

Related Questions