Reputation: 21
When Pytesseract is used with the command pytessract.image_to_data
it returns many values, one of which is the confidence. What does this confidence signify? Is it the confidence of the text detection i.e. whether text is present or not?
Or is it the confidence of the actual text extracted? If anyone can also point me towards the source where the answer to this question is found, will be really thankful.
Upvotes: 0
Views: 2850
Reputation: 7985
Represents the prediction accuracy of the current text in the bounding box. It is safe to say confidence of the extracted actual text. As stated in here you could use it for filtering weak detection.
So how can we use it in an example?
Assume we have the following example:
And we want to recognize the text with a confidence greater 40%.
The result will be:
If we look at the bounding box:
As we can see SAXONS is not recognized. Confidence is also a good indicator for the pre-processing efficiency.
For instance if we apply adaptive-threshold
From above we can see that all the texts are recognized, but the previously recognized text confidence is greater than the current ones.
Result will be:
Code:
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("example_03.jpg")
# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 71, 71)
# OCR detection
d = pytesseract.image_to_data(thr, config="--psm 6", output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['text'])
# For each detected part
for i in range(n_boxes):
# If the prediction accuracy greater than %50
if int(d['conf'][i]) > 40:
# Print the confidence
print("\nConfidence: {}\n".format(d['conf'][i]))
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Draw rectangle to the detected region
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)
# Crop the image
crp = thr[y:y + h, x:x + w]
# OCR
txt = pytesseract.image_to_string(crp, config="--psm 6")
print(txt)
# Display the cropped
cv2.imshow("crp", crp)
cv2.waitKey(0)
# Display the mask
cv2.imshow("img", img)
cv2.waitKey(0)
Upvotes: 1