Reputation: 21

Does PyTesseract OCR give the confidence of the text detection or text extraction?

When Pytesseract is used with the command pytessract.image_to_data it returns many values, one of which is the confidence. What does this confidence signify? Is it the confidence of the text detection i.e. whether text is present or not?

Or is it the confidence of the actual text extracted? If anyone can also point me towards the source where the answer to this question is found, will be really thankful.

Upvotes: 0

Answers (1)

Ahx

Reputation: 7985

What does this confidence signify? Is it the confidence of the text detection i.e. whether text is present or not? Or is it the confidence of the actual text extracted?

Represents the prediction accuracy of the current text in the bounding box. It is safe to say confidence of the extracted actual text. As stated in here you could use it for filtering weak detection.

So how can we use it in an example?

Assume we have the following example:

And we want to recognize the text with a confidence greater 40%.

The result will be:

If we look at the bounding box:

Detected	Result	Confidence
	ESTATE	96
	AGENTS	96
	=!	48

As we can see SAXONS is not recognized. Confidence is also a good indicator for the pre-processing efficiency.

For instance if we apply adaptive-threshold

Detected	Result	Confidence
	ESTATE	51
	AGENTS	69
	SAXONS	42

From above we can see that all the texts are recognized, but the previously recognized text confidence is greater than the current ones.

Result will be:

Code:

# Load the libraries
import cv2
import pytesseract

# Load the image
img = cv2.imread("example_03.jpg")

# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# threshold
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 71, 71)

# OCR detection
d = pytesseract.image_to_data(thr, config="--psm 6", output_type=pytesseract.Output.DICT)

# Get ROI part from the detection
n_boxes = len(d['text'])

# For each detected part
for i in range(n_boxes):

    # If the prediction accuracy greater than %50
    if int(d['conf'][i]) > 40:

        # Print the confidence
        print("\nConfidence: {}\n".format(d['conf'][i]))

        # Get the localized region
        (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

        # Draw rectangle to the detected region
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)

        # Crop the image
        crp = thr[y:y + h, x:x + w]

        # OCR
        txt = pytesseract.image_to_string(crp, config="--psm 6")
        print(txt)

        # Display the cropped
        cv2.imshow("crp", crp)
        cv2.waitKey(0)

# Display the mask
cv2.imshow("img", img)
cv2.waitKey(0)

Upvotes: 1

Does PyTesseract OCR give the confidence of the text detection or text extraction?

Answers (1)

Related Questions