EJL
EJL

Reputation: 205

How to recognize text with colored background images?

I am new to opencv and python as well as tesseract. Now, I am creating a script that will recognize text from an image. My code works perfectly on black text and white background or white text with black background but not in colored images. Example, white text with blue background such as a button. Is the font also affecting this? In this case, I am finding the Reboot text (the button)

this is the sample image

I tried bunch of codes and methods on image preprocessing via opencv but failed to get the result. Image binarizing, noise reduction, grayscale but no good.

This is the sample code:

from PIL import Image
import pytesseract
import cv2
import numpy as np

# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')

filename = 'image.png'
outputname = 'converted.png'

# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)

# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)

# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)

channel_height, channel_width, _ = image_channels[0].shape

# apply Otsu threshold to each channel
for i in range(0, 3):
    _, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
    image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))

# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)

# save the denoised image
cv2.imwrite(outputname, image_channels)

image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)

By running the code, I get this image: [![enter image description here][1]][1]

And the result of tesseract with TSV parameter:

level   page_num    block_num   par_num line_num    word_num    left    top width   height  conf    text
1   1   0   0   0   0   0   0   1024    768 -1  
2   1   1   0   0   0   2   13  1002    624 -1  
3   1   1   1   0   0   2   13  1002    624 -1  
4   1   1   1   1   0   172 13  832 22  -1  
5   1   1   1   1   1   172 13  127 22  84  CONFIGURATION
5   1   1   1   1   2   822 17  59  11  92  CENTOS
5   1   1   1   1   3   887 17  7   11  95  7
5   1   1   1   1   4   900 17  104 11  95  INSTALLATION
4   1   1   1   2   0   86  29  900 51  -1  
5   1   1   1   2   1   86  35  15  45  12  4
5   1   1   1   2   2   825 30  27  40  50  Bes
5   1   1   1   2   3   952 29  34  40  51  Hel
4   1   1   1   3   0   34  91  87  17  -1  
5   1   1   1   3   1   34  91  87  17  90  CentOS
4   1   1   1   4   0   2   116 9   8   -1  
5   1   1   1   4   1   2   116 9   8   0   ‘
4   1   1   1   5   0   184 573 57  14  -1  
5   1   1   1   5   1   184 573 57  14  90  Complete!
4   1   1   1   6   0   634 606 358 14  -1  
5   1   1   1   6   1   634 606 43  10  89  CentOS
5   1   1   1   6   2   683 609 7   7   96  is
5   1   1   1   6   3   696 609 24  7   96  now
5   1   1   1   6   4   725 606 67  14  96  successfully
5   1   1   1   6   5   797 606 45  10  96  installed
5   1   1   1   6   6   848 606 18  10  96  and
5   1   1   1   6   7   872 599 29  25  96  ready
5   1   1   1   6   8   906 599 15  25  95  for
5   1   1   1   6   9   928 609 20  11  96  you
5   1   1   1   6   10  953 608 12  8   96  to
5   1   1   1   6   11  971 606 21  10  95  use!
4   1   1   1   7   0   775 623 217 14  -1  
5   1   1   1   7   1   775 623 15  10  95  Go
5   1   1   1   7   2   796 623 31  10  96  ahead
5   1   1   1   7   3   833 623 18  10  96  and
5   1   1   1   7   4   857 623 38  10  96  reboot
5   1   1   1   7   5   900 625 12  8   96  to
5   1   1   1   7   6   918 625 25  8   95  start
5   1   1   1   7   7   949 626 28  11  96  using
5   1   1   1   7   8   983 623 9   10  93  it!

As you can see, the "Reboot" text is not showing. Maybe it is because of the font? Or the color?

Upvotes: 2

Views: 8386

Answers (1)

nathancy
nathancy

Reputation: 46600

Here are two different approaches:

1. Traditional image processing and contour filtering

The main idea is to extract the ROI then apply Tesseract OCR.

  • Convert image to grayscale and Gaussian blur
  • Adaptive threshold
  • Find contours
  • Iterate through contours and filter using contour approximation and area
  • Extract ROI

Once we obtain a binary image from adaptive thresholding, we find contours and filter using contour approximation with cv2.arcLength() and cv2.approxPolyDP(). If the contour has four points, we assume it is either a rectangle or square. In addition, we apply a second filter using contour area to ensure that we isolate the correct ROI. Here's the extracted ROI

enter image description here

import cv2

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.05 * peri, True)
    if len(approx) == 4 and area > 2200:
        x,y,w,h = cv2.boundingRect(approx)
        ROI = image[y:y+h, x:x+w]
        cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        ROI_number += 1

Now we can throw this into Pytesseract. Note Pytesseract requires that the image text be in black while the background in white so we do a bit of preprocessing first. Here's the preprocessed image and result from Pytesseract

enter image description here

Reboot

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

result = 255 - thresh 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

Normally, you would also need to use morphological transformations to smooth the image but for this case, the text is good enough

2. Color Thresholding

The second approach is to use color thresholding with lower and upper HSV thresholds to create a mask where we can extract the ROI. Look here for a complete example. Once the ROI is extracted, we follow the same steps to preprocess the image before throwing it into Pytesseract

Upvotes: 3

Related Questions