How to preprocess an image to remove noise and extract text Python?

Question

I have a really noisy image that I have to perform OCR on. The snippet attached is part of a bigger image. How would I go about pre-processing this image in the most optimal way?

I have already tried pre-processing the image using Otsu Binarization, smoothing the image using various filters and Erosion-Dilation. I've also used connectedComponentWithStats to remove the noise in the image. But none of this helps with the processing of the smudged text

Edit - This text needs to be pre-processed in order to perform OCR

img = cv2.imread(file,0)
gaus = cv2.GaussianBlur(img,(5,5),0)

_, blackAndWhite = cv2.threshold(gaus, 127, 255, cv2.THRESH_BINARY_INV)

nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(blackAndWhite, None, None, None, 8, cv2.CV_32S)
sizes = stats[1:, -1] 
img2 = np.zeros((labels.shape), np.uint8)

for i in range(0, nlabels - 1):
    if sizes[i] >= 50:  
        img2[labels == i + 1] = 255

res = cv2.bitwise_not(img2)

(thresh, img_bin) = cv2.threshold(img, 128, 255,cv2.THRESH_BINARY|     cv2.THRESH_OTSU)

img_bin = 255-img_bin 

kernel_length = np.array(img).shape[1]//80
 
verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, kernel_length))

hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1))

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))

img_temp1 = cv2.erode(img_bin, verticle_kernel, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=3)

img_temp2 = cv2.erode(img_bin, hori_kernel, iterations=3)
horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=3)

alpha = 0.5
beta = 1.0 - alpha

img_final_bin = cv2.addWeighted(verticle_lines_img, alpha, horizontal_lines_img, beta, 0.0)

img_final_bin = cv2.erode(~img_final_bin, kernel, iterations=2)
(thresh, img_final_bin) = cv2.threshold(img_final_bin, 128,255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

How to preprocess an image to remove noise and extract text Python?

Answers (1)

Related Questions