Ameya Manas
Ameya Manas

Reputation: 21

Does PyTesseract OCR give the confidence of the text detection or text extraction?

When Pytesseract is used with the command pytessract.image_to_data it returns many values, one of which is the confidence. What does this confidence signify? Is it the confidence of the text detection i.e. whether text is present or not?

Or is it the confidence of the actual text extracted? If anyone can also point me towards the source where the answer to this question is found, will be really thankful.

Upvotes: 0

Views: 2850

Answers (1)

Ahx
Ahx

Reputation: 7985

  • What does this confidence signify? Is it the confidence of the text detection i.e. whether text is present or not? Or is it the confidence of the actual text extracted?

Represents the prediction accuracy of the current text in the bounding box. It is safe to say confidence of the extracted actual text. As stated in here you could use it for filtering weak detection.

So how can we use it in an example?

Assume we have the following example:

enter image description here

And we want to recognize the text with a confidence greater 40%.

The result will be:

enter image description here

If we look at the bounding box:

Detected Result Confidence
enter image description here ESTATE 96
enter image description here AGENTS 96
enter image description here =! 48

As we can see SAXONS is not recognized. Confidence is also a good indicator for the pre-processing efficiency.

For instance if we apply adaptive-threshold

enter image description here

Detected Result Confidence
enter image description here ESTATE 51
enter image description here AGENTS 69
enter image description here SAXONS 42

From above we can see that all the texts are recognized, but the previously recognized text confidence is greater than the current ones.

Result will be:

enter image description here

Code:


# Load the libraries
import cv2
import pytesseract

# Load the image
img = cv2.imread("example_03.jpg")

# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# threshold
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 71, 71)

# OCR detection
d = pytesseract.image_to_data(thr, config="--psm 6", output_type=pytesseract.Output.DICT)

# Get ROI part from the detection
n_boxes = len(d['text'])

# For each detected part
for i in range(n_boxes):

    # If the prediction accuracy greater than %50
    if int(d['conf'][i]) > 40:

        # Print the confidence
        print("\nConfidence: {}\n".format(d['conf'][i]))

        # Get the localized region
        (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

        # Draw rectangle to the detected region
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)

        # Crop the image
        crp = thr[y:y + h, x:x + w]

        # OCR
        txt = pytesseract.image_to_string(crp, config="--psm 6")
        print(txt)

        # Display the cropped
        cv2.imshow("crp", crp)
        cv2.waitKey(0)

# Display the mask
cv2.imshow("img", img)
cv2.waitKey(0)

Upvotes: 1

Related Questions