Hisan
Hisan

Reputation: 2655

How to do character segmentation on an image (see description)?

I wanted to segment the characters from the background. So far I have been able to detect the image and generate bounding boxes around the image. (see image)

enter image description here

Some people also consider generating the bounding boxes around the text to be segmentation but what I'm looking forward is the segmentation of the characters from the background. (see image, the green part) enter image description here

I would use this segmentation to remove the Korean text and replace it with its English translation.

Maybe you'd be thinking of using black / white color detection to make this segmentation but it won't necessarily be pure white at the background.

Some Other Examples:

enter image description here

enter image description here

Can you provide any leads to the problem at hand? My end goal is to superimpose the English text on the image such that it would look natural and not weird.

I am open to any suggestions and ideas, if you'd like to come up with a completely different out of the box solution to segment the image that would also be great. All I'm looking to forward to is replacing the korean text with it's English alternative.

Upvotes: 1

Views: 882

Answers (1)

t2solve
t2solve

Reputation: 928

So you do have already the surrounding boxes, which are wrapping the text right ? (I’ve read in the comments.)

So the remaining questions are

A) extract your Region of Interest (ROI) Be aware of the interesting corner addressing method.

B) separate you letter from the background I would use here a simple binary threshold. The only challenge furthermore to find a global optimal threshold value for all your pictures See the python code for A and B below:

import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('kjgqd.png',0)
img = cv.medianBlur(img,5)
#extract roi with box coorindates
imgROI = img[75:230,103:330] 
ret,th1 = cv.threshold(imgROI,127,255,cv.THRESH_BINARY)
titles = ['Original Image', 'Global Thresholding (v = 127)']
images = [imgROI, th1]
for i in range(0,2):
    plt.subplot(2,1,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])
plt.show()

input: input

ROI cut: example pic roi

ouput: output

C) how to get the text from picture to written language ? Welcome to the big world of OCR This the far most complex issue of your task.

I would really recommend you to use a framework like Tesseract A good introduction is here

In the worst case you should start to train your own model to detect your language. I would only recommend this, if your language or font set is not well supported yet by Tesseract. See a howto 1 and 2

For educational reasons and only for that use-case I would recommend you to start with the topic OCR and the famous Mnist example

Here is a good overview over a zoo of possible methods how to get text out of pictures.

D) translate your text

E) put the english translation into the picture use method cv.text like:

import numpy as np
import cv2

#white dummy picture
img = np.zeros((600,200,3), np.uint8)
img.fill(255)#we make it white

# Write some Text
font                   = cv2.FONT_HERSHEY_SIMPLEX
bottomLeftCornerOfText = (10,500)
fontScale              = 1
fontColor              = (0,0,0)
lineType               = 2

cv2.putText(img,'Hello World!', 
    bottomLeftCornerOfText, 
    font, 
    fontScale,
    fontColor,
    lineType)

#Display the image
cv2.imshow("img",img)
cv2.waitKey(0)

Upvotes: 1

Related Questions