PIL produce grey pixels in OpenCV image

Question

I have a strange output in my images: all the characters are bounded with grey pixels around. I am sure at 90% that is because a OpenCV-PIL conversion issue but I don't know how to solve it.

Here is the source image:

And the output (you need to zoom to see the grey pixels..)

A detail here..

This is the code I am using:

import cv2
import tesserocr as tr
from PIL import Image
import os

src = (os.path.expanduser('~\Desktop\output4\'))

causali = os.listdir(src)  # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))

for file in enumerate(causali):  # CONTA NUMERO DI FILE CAUSALE

    cv_img = cv2.imread(os.path.expanduser('~\Desktop\output4\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)

    # since tesserocr accepts PIL images, converting opencv image to pil
    pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))

    # initialize api
    api = tr.PyTessBaseAPI()
    try:
        # set pil image for ocr
        api.SetImage(pil_img)
        # Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
        # psm values can be: block of text, single text line, single word, single character etc.
        # api.GetComponentImages method exposes this functionality
        # function returns:
        # image (:class:`PIL.Image`): Image object.
        # bounding box (dict): dict with x, y, w, h keys.
        # block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
        # paragraph id (int): textline paragraph id within its block (if paraids is True).
        # ``None`` otherwise.
        boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
        # get text
        text = api.GetUTF8Text()
        # iterate over returned list, draw rectangles
        for (im, box, _, _) in boxes:
            x, y, w, h = box['x'], box['y'], box['w'], box['h']

            cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)

            im.save(os.path.expanduser('~\Desktop\output5\{}.png').format(file[0]))

    finally:
        api.End()

Is there a way to make accept to api.SetImage() a opencv variable ?

Thanks

EDIT: Is there a way to delete all grey pixels by giving their color ?

Alex W · Accepted Answer

You need to use a binary thresholding algorithm to filter out the "noise" in your image.

C++ docs

Python docs

PIL produce grey pixels in OpenCV image

Answers (2)

Related Questions