Reputation: 19
[This is a sample image]
I want to crop out the header Text for several other similar colored images like this for OCR. what are the most effective steps to preprocess the image for better recognition only for the header text.
Upvotes: 0
Views: 1229
Reputation: 2269
ATTENTION
To all who want to copy the code and want to use it in other projects: you will have to tweak and adapt it (especially threshold/kernel/iterations values). This version works at it's best on the user provided image.
import cv2
image = cv2.imread("image.jpg")
image_c = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # grayscale
cv2.imshow('gray', gray)
cv2.waitKey(0)
_, thresh = cv2.threshold(gray, 50, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU) # threshold
cv2.imshow('thresh', thresh)
cv2.waitKey(0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
dilated = cv2.dilate(thresh, kernel, iterations=13) # dilate
cv2.imshow('dilated', dilated)
cv2.waitKey(0)
image, contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) # get contours
# for each contour found, draw a rectangle around it on original image
for i, contour in enumerate(contours):
# get rectangle bounding contour
x, y, w, h = cv2.boundingRect(contour)
roi = image_c[y:y + h, x:x + w]
if 50 < h < 100 or 200 < w < 420: # these values are specific for this example
# draw rectangle around contour on original image
rect = cv2.rectangle(image_c, (x, y), (x + w, y + h), (255, 255, 255), 1)
cv2.imshow('rectangles', rect)
cv2.waitKey(0)
cv2.imwrite('extracted{}.png'.format(i), roi)
# write original image with added contours to disk - change values above to (255,0,255) to see clearly the contours
cv2.imwrite("contoured.jpg", image_c)
Upvotes: 3
Reputation: 101
May be you can try to detect text first then can get maximum row index from detected area and cut it. There are multiple way to detect text using opencv. You may try this question here.
Upvotes: 0