Khazad
Khazad

Reputation: 33

Text binarization

I'd like to binarize this image: http://imgur.com/A5u9xSA

to use it with tesseract-ocr. Currently, I managed to get this: http://imgur.com/bU0FSt8

But I need clear image with only text, without black background parts, like this one: imgur.com/KXQNErM

My current code:

img = cv2.imread(path, 0)
blur = cv2.GaussianBlur(img, (3, 3), 0)
filtered = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 405, 1)
bitnot = cv2.bitwise_not(filtered)
cv2.imshow('image', bitnot)
cv2.imwrite("h2kcw2/out1.png", bitnot)
cv2.waitKey(0)
cv2.destroyAllWindows()

Upvotes: 3

Views: 3238

Answers (1)

Eliezer Bernart
Eliezer Bernart

Reputation: 2426

A regular threshold can present a good result:

Result

img = cv2.imread(path, 0)
ret, thresh = cv2.threshold(img, 70, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('image', thresh)
cv2.imwrite("h2kcw2/out1.png", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

Upvotes: 4

Related Questions