Sander Berntsen
Sander Berntsen

Reputation: 23

OpenCV process all text to be black on white (segmentation)

Is it possible to somehow make it so that all text in a document is black on white after thresholding. I've been looking online alot but I haven't been able to come to a solution. My current thresholded image is: https://i.ibb.co/Rpqcp7v/thresh.jpg

The document needs to be read by an OCR and for that I need to have the areas that are currently white on black, to be inverted. How would I go about doing this? my current code:

# thresholding
def thresholding(image):
    # thresholds the image into a binary image (black and white)
    return cv2.threshold(image, 120, 255, cv2.THRESH_BINARY)[1]

Upvotes: 2

Views: 1717

Answers (1)

Christoph Rackwitz
Christoph Rackwitz

Reputation: 15561

Use a median filter to estimate the dominant color (background).

Then subtract the image from that... you'll get white text on black background. I'm using the absolute difference. Invert for black on white.

im = cv.imread("thresh.jpg", cv.IMREAD_GRAYSCALE)
im = cv.pyrDown(cv.pyrDown(im)) # picture too large for stack overflow
bg = cv.medianBlur(im, 51) # suitably large kernel to cover all text
out = 255 - cv.absdiff(bg, im)

enter image description here

enter image description here enter image description here

Upvotes: 6

Related Questions