Reputation: 149
i used pytesseract
to identify text from image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
then i used below code to identify text
textImg = pytesseract.image_to_string(Image.open(imgLoc+"/"+imgName))
print(textImg)
text_file = open(imgLoc+"/"+"oriText.txt", "w")
text_file.write(textImg)
text_file.close()
this is my input image
this is an image of my output text file
is there any way to identify the text clearly from image
Upvotes: 0
Views: 1042
Reputation: 1850
Your can try improving the results by shortening the character set, and only allowing characters that are legal in your particular language (exclude numbers, special characters etc) . This Answer will help.
Tesseract OCR isn't the best at figuring out characters in a image. Your can try processing the image a bit, in order to improve the results. This will help
I generally prefer this website www.onlineocr.net for doing Optical Character Recognition as the results are almost perfect each time. Your can try using their own API, for doing character recognition (requires internet connectivity to be functional). The Results obtained by using this API, are far superior then from tesseract OCR. So you may give it a try.
Upvotes: 1