caner karagüler
caner karagüler

Reputation: 73

How to obtain the trust-rate of an ocr output?

Is there a way to get the trust rate of an OCR output that is produced by Pytesseract ? What I mean by the trust rate is the correctness percentage of the OCR output.

Example:

text = pytesseract.image_to_string(editedImage) 

For this text string I also want to show the trust rate if it is possible.

Edit: I tried the image_to_data but I got an error

print(pytesseract.image_to_data(Image.open('test.png')))



Traceback (most recent call last):
  File "/usr/lib/python3.4/tkinter/__init__.py", line 1536, in __call__
    return self.func(*args)
  File "/home/caner/Desktop/Met/OCR-METv3/venv/tkgui.py", line 192, in convert
    print(pytesseract.image_to_data(Image.open('test.png')))
  File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 232, in image_to_data
    return run_and_get_output(image, 'tsv', lang, config, nice)
  File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 142, in run_and_get_output
    with open(filename, 'rb') as output_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_2mxczh8n_out.tsv' 

Upvotes: 4

Views: 3385

Answers (1)

neznidalibor
neznidalibor

Reputation: 175

My guess is that you're referring to confidence with trust rate. There is some info regarding this on the repo of the pytesseract module here.

Functions

  • image_to_string Returns the result of a Tesseract OCR run on the image to string
  • image_to_boxes Returns result containing recognized characters and their box boundaries
  • image_to_data Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the Tesseract TSV documentation

I think what you're looking for is the image_to_data function.

Upvotes: 4

Related Questions