Dave Lin
Dave Lin

Reputation: 88

Using multiple languages in Pytesser

I have started to use Pytesser, which works great with both english and chinese, but is there a way to have both languages work at the same time? Would I have to make my own traineddata file? My code is:

import Image
from pytesser import *
print image_to_string(Image.open("chinese_and_english.jpg"), lang="eng")
#also want to have chinese be recognized 

chinese_and_english

Upvotes: 6

Views: 16650

Answers (2)

Nelson
Nelson

Reputation: 2178

PyTesseract supports multiple languages:

https://pypi.org/project/pytesseract/

Specifically, in the lang parameter:

enter image description here

Upvotes: 0

sirfz
sirfz

Reputation: 4277

I'm not sure about Pytesser but using tesserocr you can specify multiple languages. For example:

import tesserocr

with tesserocr.PyTessBaseAPI(lang='eng+chi_tra') as api:
    api.SetImageFile('eSXSz.jpg')
    print api.GetUTF8Text()

# or simply
print tesserocr.file_to_text('eSXSz.jpg', lang='eng+chi_tra')

Example output for your image:

In [8]: print tesserocr.file_to_text('eSXSz.jpg', lang='eng+chi_tra')
Character, Chmese 動m川爬d
胸肌岫馴伽 H枷﹏ P﹏… …

〔Manda‥﹝ 二 Standard C…爬虯



一

口

X慣ng怕ng

Note that it's more efficient to initialize the API once as in the first example and re-use it for multiple images by calling SetImageFile (or SetImage with a PIL.Image object) to avoid re-initializing the API every time.

Upvotes: 10

Related Questions