Reputation: 73
Is there a way to get the trust rate
of an OCR output that is produced by Pytesseract
?
What I mean by the trust rate is the correctness percentage of the OCR output.
Example:
text = pytesseract.image_to_string(editedImage)
For this text string I also want to show the trust rate if it is possible.
Edit: I tried the image_to_data
but I got an error
print(pytesseract.image_to_data(Image.open('test.png')))
Traceback (most recent call last):
File "/usr/lib/python3.4/tkinter/__init__.py", line 1536, in __call__
return self.func(*args)
File "/home/caner/Desktop/Met/OCR-METv3/venv/tkgui.py", line 192, in convert
print(pytesseract.image_to_data(Image.open('test.png')))
File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 232, in image_to_data
return run_and_get_output(image, 'tsv', lang, config, nice)
File "/home/caner/Desktop/Met/OCR-METv3/venv/lib/python3.4/site-packages/pytesseract/pytesseract.py", line 142, in run_and_get_output
with open(filename, 'rb') as output_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_2mxczh8n_out.tsv'
Upvotes: 4
Views: 3385
Reputation: 175
My guess is that you're referring to confidence
with trust rate
.
There is some info regarding this on the repo of the pytesseract module here.
Functions
- image_to_string Returns the result of a Tesseract OCR run on the image to string
- image_to_boxes Returns result containing recognized characters and their box boundaries
- image_to_data Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the Tesseract TSV documentation
I think what you're looking for is the image_to_data
function.
Upvotes: 4