greident
greident

Reputation: 45

tesseract can't init russian language

My code

 private void button1_Click(object sender, EventArgs e)
        {
            if (openFileDialog1.ShowDialog() == DialogResult.OK)
            {
                textBox1.Clear();               

                var img = new Bitmap(openFileDialog1.FileName);

                //var ocr = new TesseractEngine("./tessdata", "eng", EngineMode.TesseractAndCube);

                var ocr = new TesseractEngine("./rus", "rus", EngineMode.TesseractAndCube);

                var page = ocr.Process(img);


                textBox1.Text = page.GetText();

            }
        }

Code works fine with English trained data, but it throws an error when I change it to Russian.

Here is the error:

Tesseract.TesseractException: "Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details."

My Tesseract version is 3.0.2.

I've downloaded Russian tessdata files from https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302

Upvotes: 1

Views: 4182

Answers (2)

Ivan
Ivan

Reputation: 1

Confirmed that problem. Tesseract can run with single language (I've tried bul.traineddata). But "rus" always gives that result in logcat:

Could not initialize Tesseract API with language=rus!

Of cause I've had rus.traineddata file in assets :-)

Upvotes: 0

benderalex5
benderalex5

Reputation: 137

work for me

    Tesseract tesseract = new Tesseract();
    tesseract.setLanguage("rus");
    try {
        tesseract.setDatapath("/home/test/tessdata");
        String text = tesseract.doOCR(new File("/home/test/Pictures/photo.jpg"));
        System.out.print(text);
    } catch (TesseractException e) {
        e.printStackTrace();
    }

test data - https://github.com/tesseract-ocr/tessdata

Upvotes: 2

Related Questions