William Lombard
William Lombard

Reputation: 347

Getting 'confidence' value for OCR text - Python/Azure

I've managed to put together enough Python code to capture an image, store it locally, treat it to remove noise etc and then run Azure OCR API (in PyCharm, using Python 3.9) to get back the text it contains.

Here is an approximate example of the images I have to work with. The only information I want to extract on the left hand side (2nd line) - I should also point out that it is to read as a single word. All other text can be ignored.

enter image description here

Here is the main code for the OCR part of the script

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

import os
import time

endpoint = 'https://xxxxxxx.cognitiveservices.azure.com/'
subscription_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))


# Get image path
read_image_path = os.path.join ("file.png")
# Open the image
read_image = open(read_image_path, "rb")


#Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read_in_stream(read_image,  raw=True)

# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]


# Call the "GET" API and wait for it to retrieve the results
while True:
    read_result = computervision_client.get_read_result(operation_id)
#print(read_result)
if read_result.status not in ['notStarted', 'running']:
    break
time.sleep(1)
ocrText = read_result.analyze_result.read_results[0].lines[1].text

print(ocrText)

this is all well and good but how do I get the 'confidence' value of the result? And print it to the screen etc, like the variable 'ocrText"?

I've printed out the 'raw' analysis of an OCR call and can see a confidence value lurks in there at the word level, but I cannot figure out how to extract it! The analysis dictionary is extremely "nested"!

{'status': 'succeeded', 'createdDateTime': '2022-03-11T20:15:45Z', 'lastUpdatedDateTime': '2022-03-11T20:15:46Z', 'analyzeResult': {'version': '3.2.0', 'modelVersion': '2021-04-12', 'readResults': [{'page': 1, 'angle': -0.1123, 'width': 828, 'height': 536, 'unit': 'pixel', 'lines': [{'boundingBox': [204, 94, 731, 94, 733, 243, 206, 248], 'text': 'oren', 'appearance': {'style': {'name': 'other', 'confidence': 0.878}}, 'words': [{'boundingBox': [204, 94, 673, 99, 674, 235, 210, 249], 'text': 'oren', 'confidence': 0.993}]}, {'boundingBox': [209, 304, 719, 304, 720, 452, 210, 454], 'text': 'osun', 'appearance': {'style': {'name': 'other', 'confidence': 0.878}}, 'words': [{'boundingBox': [209, 304, 645, 311, 646, 446, 216, 455], 'text': 'osun', 'confidence': 0.191}]}]}]}}

Upvotes: 0

Views: 1178

Answers (2)

Arunava Maulik
Arunava Maulik

Reputation: 63

If it is still relevant for you, something like this should work

if read_result.status == OperationStatusCodes.succeeded:
    for text_result in read_result.analyze_result.read_results:
        for line in text_result.lines:
            for index in range(len(line.words)):
                print(line.words[index].text)
                print(line.words[index].bounding_box)
                print(line.words[index].confidence)

Upvotes: 1

RKM
RKM

Reputation: 1389

There's no built in function to extract a specific portion of an image using Pytesseract but we can use OpenCV to extract the ROI bounding box then throw this ROI into Pytesseract.

We convert the image to grayscale then threshold to obtain a binary image. Assuming you have the desired ROI coordinates, we use Numpy slicing to extract the desired ROI.

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.jpg', 0)
thresh = 255 - cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

x,y,w,h = 37, 625, 309, 28  
ROI = thresh[y:y+h,x:x+w]
data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.waitKey()

Upvotes: 0

Related Questions