Reputation: 5117
I am working with Google Vision API and Python to apply text_detection
which is an OCR function of Google Vision API which detects the text on the image and returns it as an output. My original image is the following:
I have used the following different algorithms:
1) Apply text_detection
to the original image
2) Enlarge the original image by 3 times and then apply text_detection
3) Apply Canny
, findContours
, drawContours
on a mask (with OpenCV) and then text_detection
to this
4) Enlarge the original image by 3 times, apply Canny
, findContours
, drawContours
on a mask (with OpenCV
) and then text_detection
to this
5) Sharpen the original image and then apply text_detection
6) Enlarge the original image by 3 times, sharpen the image and then apply text_detection
The ones which fare the best are (2) and (5). On the other hand, (3) and (4) are probably the worse among them.
The major problem is that text_detection
does not detect in most cases the minus sign especially the one of '-1.00'.
Also, I do not know why, sometimes it does not detect '-1.00' itself at all which is quite surprising as it does not have any significant problem with the other numbers.
What do you suggest me to do to detect accurately the minus sign and in general the numbers?
(Keep in mind that I want to apply this algorithm to different boxes so the numbers may not be at the same position as in this image)
Upvotes: 1
Views: 6023
Reputation: 86
I dealt with the same problem. Your end goal is to correctly identify the text. For OCR conversion you are using a third party service or tool (google API / tesseract etc.)
All the the approach that you are talking about become meaningless because whatever transformations that you are doing using openCV will be repeated by tesseract.The best you should do is supply the input in a easy format.
What did work for me the best is breaking the image is parts (BOXES - "SQUARES AND RECTANGLES" - using a sample code for identifying the rectangles in all channels in openCV repo examples using https://github.com/opencv/opencv/blob/master/samples/python/squares.py) and then crop it and then send it for OCR by parts.
Upvotes: 2
Reputation: 4236
If Google Vision API uses Tesseract, which I think it does,, then optimization is usually as follows:
As for negative signs, well, use Tesseract directly, if you can. You will be able to retrain it or to download better trainings. Or well, you can correct the errors using additional algorithm. I.e. implement your recheck as suggested in ZdaR's answer.
Upvotes: 0
Reputation: 22954
Since you are using Google Vision API which detects the text on the image, so it is not obvious for a text detection API to detect negative numbers in first place. Assuming that fact that you may not able to re-train the API as per your case, I would recommend you to write a simple script which filters the contours on the basis of it's shape and size, using this script you can easily segment out the negative signs and then merge it with the output from Google Vision API as
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg", 0)
ret, thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
i, contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
# filter the contours.
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
if 5 < cv2.contourArea(cnt) < 50 and float(w)/h > 3:
print "I have detected a minus sign at : ", x, y, w, h
After this filtering process you can make a calculated guess if a given digit has a negative sign close it it's left side.
Upvotes: 0