Outcast
Outcast

Reputation: 5117

How to improve OCR results with Google Vision API and Python?

I am working with Google Vision API and Python to apply text_detection which is an OCR function of Google Vision API which detects the text on the image and returns it as an output. My original image is the following:

enter image description here

I have used the following different algorithms:

1) Apply text_detection to the original image

2) Enlarge the original image by 3 times and then apply text_detection

3) Apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this

4) Enlarge the original image by 3 times, apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this

5) Sharpen the original image and then apply text_detection

6) Enlarge the original image by 3 times, sharpen the image and then apply text_detection

The ones which fare the best are (2) and (5). On the other hand, (3) and (4) are probably the worse among them.

The major problem is that text_detection does not detect in most cases the minus sign especially the one of '-1.00'. Also, I do not know why, sometimes it does not detect '-1.00' itself at all which is quite surprising as it does not have any significant problem with the other numbers.

What do you suggest me to do to detect accurately the minus sign and in general the numbers?

(Keep in mind that I want to apply this algorithm to different boxes so the numbers may not be at the same position as in this image)

Upvotes: 1

Views: 6023

Answers (3)

I dealt with the same problem. Your end goal is to correctly identify the text. For OCR conversion you are using a third party service or tool (google API / tesseract etc.)

All the the approach that you are talking about become meaningless because whatever transformations that you are doing using openCV will be repeated by tesseract.The best you should do is supply the input in a easy format.

What did work for me the best is breaking the image is parts (BOXES - "SQUARES AND RECTANGLES" - using a sample code for identifying the rectangles in all channels in openCV repo examples using https://github.com/opencv/opencv/blob/master/samples/python/squares.py) and then crop it and then send it for OCR by parts.

Upvotes: 2

Dalen
Dalen

Reputation: 4236

If Google Vision API uses Tesseract, which I think it does,, then optimization is usually as follows:

  1. Sharpen
  2. Binarize (or grayscale if you must)
  3. Trim borders (Tesseract likes smooth background)
  4. Deskew (Tesseract tolerates very small skew angle. It likes nice straight text lines)
  5. Reshape and resize (Put it in a page-like shape and resize if necessary)

As for negative signs, well, use Tesseract directly, if you can. You will be able to retrain it or to download better trainings. Or well, you can correct the errors using additional algorithm. I.e. implement your recheck as suggested in ZdaR's answer.

Upvotes: 0

ZdaR
ZdaR

Reputation: 22954

Since you are using Google Vision API which detects the text on the image, so it is not obvious for a text detection API to detect negative numbers in first place. Assuming that fact that you may not able to re-train the API as per your case, I would recommend you to write a simple script which filters the contours on the basis of it's shape and size, using this script you can easily segment out the negative signs and then merge it with the output from Google Vision API as

import cv2
import numpy as np


img = cv2.imread("path/to/img.jpg", 0)

ret, thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

i, contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)

# filter the contours.
for cnt in contours:
    x, y, w, h = cv2.boundingRect(cnt)
    if 5 < cv2.contourArea(cnt) < 50 and float(w)/h > 3:
        print "I have detected a minus sign at : ", x, y, w, h

After this filtering process you can make a calculated guess if a given digit has a negative sign close it it's left side.

Upvotes: 0

Related Questions