Reputation: 521
I just installed Tesseract OCR and after running the command $ tesseract --list-langs
the output showed only 2 languages, eng
and osd
. My question is, how do I load another language, in my case specifically, Japanese?
Upvotes: 9
Views: 18148
Reputation: 66
On Mac, if you had installed tesseract using brew, then:
/opt/homebrew/Cellar/tesseract/share/tessdata/
."Japanese"
or whatever the file name is.Upvotes: 1
Reputation: 141
1. pip install pytesseract
2. for windows install tesseract-ocr from
https://digi.bib.uni-mannheim.de/tesseract
select all language options while installing
3. set the tesseract-ocr path under anaconda/lib/site-packages/pytesseract/pytesseract.py
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
4. from pytesseract import image_to_string
print(image_to_string(test_file, 'jpn')) #for Japenese text extraction
Upvotes: 2
Reputation: 31
This works for me:
sudo apt-get install tesseract-ocr-jpn
hope this will help.
Upvotes: 3
Reputation: 521
I learned that by grabbing the trained data from https://github.com/tesseract-ocr/tessdata and placing it in the same directory as the other trained data, i.e., eng.traineddata
and by passing the language flag -l LANG
tesseract should be able to read the language you've specified, in the following example, Japanese: tesseract -l jpn sample-jpn.png output-jpn
.
Upvotes: 5