Pyd
Pyd

Reputation: 6159

Python accuracy for tesseract

I have run the tesseract ocr convert image file into string.

Now i have the out put

how do i compare the original PNG file and output text file whether the accuarcy is correct

basewidth = 2700
img = Image.open('D:OCR\\page1.png')
wpercent = (basewidth/float(img.size[0]))
hsize = int((float(img.size[1])*float(wpercent)))
img = img.resize((basewidth,hsize), PIL.Image.ANTIALIAS)
img.save('page1_zoom.png') 
print(image_to_string(Image.open('D:\page1_zoom.png')))

Upvotes: 0

Views: 1390

Answers (1)

Amarpreet Singh
Amarpreet Singh

Reputation: 2260

How to check is something is accurate ?

Definitely you will need some manual baseline/ Golden data to compare results to. You will need your test data or at least the parameters you want to verify to.

Test cases could be something like: 
 1. Whole textual data 
 2. No of lines 
 3. No of Paragraphs 
 4. Position of text

Tesseract vs Google ocr:

If you want to test tesseract accuracy with other OCR then you can try google OCR that gives better results than tesseract (although it is based on it)

Tesseract training:

Tesseract does provide feature of training to improve the accuracy of results. 

Upvotes: 1

Related Questions