Reputation: 287
The ultimate goal of this project is to take the jar and put it in a directory where it uses tesseract and outputs a results directory and the output txt file. I am having some issues with tesseract, though.
I am working with tess4j in Java with Maven and I want to make my code into an executable jar. The project works fine as a desktop app but whenever i try to run using java -jar fileName.jar
(after exporting to a jar) it gives me the error
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory
Failed loading language 'eng'
...
I looked online and couldnt really find out how to set up tesseract for a jar and get the paths right. Now I use maven and have the Tesseract dependency in my pom file (tess4j -v 3.0) and I have the tessdata in my project.
I am fairly new to maven and jar files and have never used tesseract before, but as far as i can tell from the internet I set it up correctly.
Does anyone know how to make tess4j point to the tessdata directory in my project and have a dynamic path so i can move use it on multiple computers and places?
This is how I call Tesseract
Tesseract instance = new Tesseract();
instance.setDatapath("src/main/resources");
String result = instance.doOCR(imageFile);
String fileName = imageFile.getName().replace(".jpg", "");
System.out.println("Parsed Image " + fileName);
return result;
EDIT
This is how I tried to set the environment variable TESSDATA_PREFIX in my code
String dir = System.getProperty("user.dir");
System.out.println("current dir = " + dir);
ProcessBuilder pb = new ProcessBuilder("CMD", "/C", "SET");
Map<String, String> env = pb.environment();
env.put("TESSDATA_PREFIX", dir + "\\tessdata");
Process p = pb.start();
but this had no discernible effect. I still got the same error
EDIT 2
According to the error message I need to set it to the parent dir of the tessdata, I also tried this to no avail
EDIT 3
After a ton of searching and trying to fix it, I am not sure it is even possible. The doOcr method in tesseract takes in a buffered image or File, which would be alright if my images weren't dynamic so I cant really store them in the jar. Not to mention the fact that the TESSDATA_PREFIX still wont set. If anyone has any ideas i am all ears still and I will keep looking for a solution but im not sure it will work at all
Upvotes: 5
Views: 9180
Reputation: 287
It randomly started working when I
put the tessdata folder in the same directory as my jar
changed the setDatapath to the following
Tesseract instance = new Tesseract();
instance.setDatapath(".");
String result = instance.doOCR(imageFile);
String fileName = imageFile.getName().replace(".jpg", "");
System.out.println("Parsed Image " + fileName);
return result;
and 3. I exported from eclipse by right clicking the project, selecting java -> runnable jar, then setting the option "Extract Required Libraries into Generated Jars".
(side note, the environment setting like I was doing early does not need to be in the project anymore)
I really thought I tried this but i guess something must have been wrong. I removed tessdata from my project and will have to include that wherever the jar is run. Im not really sure why it started working but im glad it did
Upvotes: 1
Reputation: 8345
You can invoke instance.setDatapath
method to point Tesseract to the location of your tessdata
folder.
http://tess4j.sourceforge.net/docs/docs-3.0/
Upvotes: 1