identify clear text from image python

i used pytesseract to identify text from image

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

then i used below code to identify text

textImg = pytesseract.image_to_string(Image.open(imgLoc+"/"+imgName))

print(textImg)
text_file = open(imgLoc+"/"+"oriText.txt", "w")
text_file.write(textImg)
text_file.close()

this is my input image

enter image description here

this is an image of my output text file

enter image description here

is there any way to identify the text clearly from image

Upvotes: 0

Views: 1042

Answers (1)

Vasu Deo.S
Vasu Deo.S

Reputation: 1850

Your can try improving the results by shortening the character set, and only allowing characters that are legal in your particular language (exclude numbers, special characters etc) . This Answer will help.

Tesseract OCR isn't the best at figuring out characters in a image. Your can try processing the image a bit, in order to improve the results. This will help

  • Make sure the image dpi/ppi is above 250 otherwise the results may be inaccurate.

I generally prefer this website www.onlineocr.net for doing Optical Character Recognition as the results are almost perfect each time. Your can try using their own API, for doing character recognition (requires internet connectivity to be functional). The Results obtained by using this API, are far superior then from tesseract OCR. So you may give it a try.

Upvotes: 1

Related Questions