Python image clean to solve the captcha

Question

I need to resolve captcha automatically to grab the public data from sites.

I use python and opencv. I'm newbee in solving the images processing. After search, as a method to resolve captcha I came up with next. As the text in Captha uses group of related colours I try to use the HSV format and mask, then convert image to Grayscale and use Threshold (Adaptive_THRESH_MEAN_C) to remove noise from the image.

But this is not enough to remove noise and provide automatic text recognition with OCR (Tesseract). See images below.

Is there something I can improve in my solution or there is a better way?

Original images:

Processed images:

from PIL import Image
image = Image.open("captcha-img.png").convert("L")
pixel_matrix = image.load()

# thresholding
for column in range(0, image.height):
    for row in range(0, image.width):
        if pixel_matrix[row, column] != 0:
            pixel_matrix[row, column] = 255

# stray line and pixel removal
for column in range(1, image.height - 1):
    for row in range(1, image.width - 1):
        if pixel_matrix[row, column] == 0 \
            and pixel_matrix[row, column - 1] == 255 and pixel_matrix[row, column + 1] == 255:
            pixel_matrix[row, column] = 255
        if pixel_matrix[row, column] == 0 \
            and pixel_matrix[row - 1, column] == 255 and pixel_matrix[row + 1, column] == 255:
            pixel_matrix[row, column] = 255

image.save("output.png")

Python image clean to solve the captcha

Answers (1)

Related Questions