Lazyac
Lazyac

Reputation: 31

How to bind Tika python with Tesseract OCR?

When i'm calling this in terminal it works perfectly!

tesseract 1.jpg outPutFileHere -l fra

But i'm trying to make it works with tika

import tika
import sys
from tika import parser
from tika import detector
tikedDocument = parser.from_file(TextImage)

with the same text image i have no results with tika :(

Have you an idea on what's going on?

Thank You

Upvotes: 3

Views: 3284

Answers (1)

user1375602
user1375602

Reputation:

You need to provide header called "X-Tika-OCRLanguage" for example:

headers = {
    "X-Tika-OCRLanguage": "eng+nor"
}
parsed = parser.from_file(path, headers=headers)

Upvotes: 3

Related Questions