Ian
Ian

Reputation: 287

Tesseract For Java setting Tessdata_Prefix for executable jar

The ultimate goal of this project is to take the jar and put it in a directory where it uses tesseract and outputs a results directory and the output txt file. I am having some issues with tesseract, though. I am working with tess4j in Java with Maven and I want to make my code into an executable jar. The project works fine as a desktop app but whenever i try to run using java -jar fileName.jar(after exporting to a jar) it gives me the error

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory
Failed loading language 'eng'
...

I looked online and couldnt really find out how to set up tesseract for a jar and get the paths right. Now I use maven and have the Tesseract dependency in my pom file (tess4j -v 3.0) and I have the tessdata in my project.

I am fairly new to maven and jar files and have never used tesseract before, but as far as i can tell from the internet I set it up correctly.

Does anyone know how to make tess4j point to the tessdata directory in my project and have a dynamic path so i can move use it on multiple computers and places?

This is how I call Tesseract

    Tesseract instance = new Tesseract();
    instance.setDatapath("src/main/resources");
    String result = instance.doOCR(imageFile);
    String fileName = imageFile.getName().replace(".jpg", "");
    System.out.println("Parsed Image " + fileName);
    return result;

EDIT

This is how I tried to set the environment variable TESSDATA_PREFIX in my code

String dir = System.getProperty("user.dir");
System.out.println("current dir = " + dir);
ProcessBuilder pb = new ProcessBuilder("CMD", "/C", "SET");
Map<String, String> env = pb.environment();
env.put("TESSDATA_PREFIX", dir + "\\tessdata");
Process p = pb.start();

but this had no discernible effect. I still got the same error

EDIT 2

According to the error message I need to set it to the parent dir of the tessdata, I also tried this to no avail

EDIT 3

After a ton of searching and trying to fix it, I am not sure it is even possible. The doOcr method in tesseract takes in a buffered image or File, which would be alright if my images weren't dynamic so I cant really store them in the jar. Not to mention the fact that the TESSDATA_PREFIX still wont set. If anyone has any ideas i am all ears still and I will keep looking for a solution but im not sure it will work at all

Upvotes: 5

Views: 9180

Answers (2)

Ian
Ian

Reputation: 287

It randomly started working when I

  1. put the tessdata folder in the same directory as my jar

  2. changed the setDatapath to the following

    Tesseract instance = new Tesseract();
    instance.setDatapath(".");
    String result = instance.doOCR(imageFile);
    String fileName = imageFile.getName().replace(".jpg", "");
    System.out.println("Parsed Image " + fileName);
    return result;
    

and 3. I exported from eclipse by right clicking the project, selecting java -> runnable jar, then setting the option "Extract Required Libraries into Generated Jars".

(side note, the environment setting like I was doing early does not need to be in the project anymore)

I really thought I tried this but i guess something must have been wrong. I removed tessdata from my project and will have to include that wherever the jar is run. Im not really sure why it started working but im glad it did

Upvotes: 1

nguyenq
nguyenq

Reputation: 8345

You can invoke instance.setDatapath method to point Tesseract to the location of your tessdata folder.

http://tess4j.sourceforge.net/docs/docs-3.0/

Upvotes: 1

Related Questions