Reputation: 23
Is it possible to somehow make it so that all text in a document is black on white after thresholding. I've been looking online alot but I haven't been able to come to a solution. My current thresholded image is: https://i.ibb.co/Rpqcp7v/thresh.jpg
The document needs to be read by an OCR and for that I need to have the areas that are currently white on black, to be inverted. How would I go about doing this? my current code:
# thresholding
def thresholding(image):
# thresholds the image into a binary image (black and white)
return cv2.threshold(image, 120, 255, cv2.THRESH_BINARY)[1]
Upvotes: 2
Views: 1717
Reputation: 15561
Use a median filter to estimate the dominant color (background).
Then subtract the image from that... you'll get white text on black background. I'm using the absolute difference. Invert for black on white.
im = cv.imread("thresh.jpg", cv.IMREAD_GRAYSCALE)
im = cv.pyrDown(cv.pyrDown(im)) # picture too large for stack overflow
bg = cv.medianBlur(im, 51) # suitably large kernel to cover all text
out = 255 - cv.absdiff(bg, im)
Upvotes: 6