Reputation: 267
I am currently working on a project where I need to process an image for OCR. I have filters set and in place to make the OCR's job as easy as possible, but there is one aspect of the image that I cannot figure out how to fix. In the included image you can see the text that I am trying to read ("PRTraining Tissue...") and there is a black border around the image that needs to be removed in order for my skew correction code to work. Is there any easy way to quickly fill in this black border with white without affecting the text?
Unfiltered Image:
Filtered Image:
I have already written some code to remove the majority of the background, but large black spots still remain as a border. The included code is my image cropping script that removes the majority of the images black border and attempts to isolate the text as much as possible, but unfortunately, it still leaves quite a significant amount of black that messes with my skew correction script.
def boarderRemoval(img):
"""
Takes in a numpy array and crops the image down to isolate the text (Still leaves a small black border that varys from image to image\n
Vars:\n
- img <- numpy array of the label\n
Returns:\n
- Cropped down image with smaller black borders
"""
contours, hierarchy = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnt = contours[0]
x,y,w,h = cv2.boundingRect(cnt)
correctedImage = img[y: y + h, x: x + w]
return correctedImage
Upvotes: 5
Views: 3643
Reputation: 46620
Starting from your filtered image, here's a simple approach
After converting to grayscale, we find the main contour that we want to keep and draw this section onto a mask. Afterwards, we invert the mask to get this image which represents the desired border section to fill in white
Now we simply cv2.bitwise_or()
with the original image to get our result
import cv2
import numpy as np
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
mask = np.zeros(image.shape, dtype=np.uint8)
cnts = cv2.findContours(gray, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.fillPoly(mask, cnts, [255,255,255])
mask = 255 - mask
result = cv2.bitwise_or(image, mask)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey(0)
Upvotes: 7