Reputation: 2269
I have a strange output in my images: all the characters are bounded with grey pixels around. I am sure at 90% that is because a OpenCV-PIL conversion issue but I don't know how to solve it.
Here is the source image:
And the output (you need to zoom to see the grey pixels..)
A detail here..
This is the code I am using:
import cv2
import tesserocr as tr
from PIL import Image
import os
src = (os.path.expanduser('~\\Desktop\\output4\\'))
causali = os.listdir(src) # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))
for file in enumerate(causali): # CONTA NUMERO DI FILE CAUSALE
cv_img = cv2.imread(os.path.expanduser('~\\Desktop\\output4\\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
im.save(os.path.expanduser('~\\Desktop\\output5\\{}.png').format(file[0]))
finally:
api.End()
Is there a way to make accept to api.SetImage()
a opencv variable ?
Thanks
EDIT: Is there a way to delete all grey pixels by giving their color ?
Upvotes: 1
Views: 511
Reputation: 2269
So, this is my solution. Found a way to use OpenCV instead of PIL as long as the first one don't convert the image to JPEG during the process. We will have a clean image from input to output.
Here is the code:
import cv2
import tesserocr as tr
from PIL import Image
import os
cv_img = cv2.imread('C:\\Users\\Link\\Desktop\\0.png', cv2.IMREAD_UNCHANGED)
idx = 0
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv_img)
# initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im, box, _, _) in boxes:
x, y, w, h = box['x'], box['y'], box['w'], box['h']
cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)
roi = cv_rect[y:y + h, x:x + w]
cv2.imwrite(os.path.expanduser('~\\Desktop\\output5\\image.png'), roi)
finally:
api.End()
Upvotes: 0
Reputation: 38253
You need to use a binary thresholding algorithm to filter out the "noise" in your image.
Upvotes: 1