Reputation: 5313
I am working on a Spring-MVC application in which I am using Tesseract for OCR. I am getting an Index out of bounds exception for the file I am passing. Any ideas?
Error log :
et.sourceforge.tess4j.TesseractException: java.lang.IndexOutOfBoundsException
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
at com.tooltank.spring.service.GroupAttachmentsServiceImpl.testOcr(GroupAttachmentsServiceImpl.java:839)
at com.tooltank.spring.service.GroupAttachmentsServiceImpl.lambda$addAttachment$0(GroupAttachmentsServiceImpl.java:447)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException
at javax.imageio.stream.FileCacheImageOutputStream.seek(FileCacheImageOutputStream.java:170)
at net.sourceforge.tess4j.util.ImageIOHelper.getImageByteBuffer(ImageIOHelper.java:297)
at net.sourceforge.tess4j.Tesseract.setImage(Tesseract.java:397)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
... 4 more
Code :
private String testOcr(String fileLocation, int attachId) {
try {
File imageFile = new File(fileLocation);
BufferedImage img = ImageIO.read(imageFile);
BufferedImage blackNWhite = new BufferedImage(img.getWidth(), img.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
Graphics2D graphics = blackNWhite.createGraphics();
graphics.drawImage(img, 0, 0, null);
String identifier = String.valueOf(new BigInteger(130, random).toString(32));
String blackAndWhiteImage = previewPath + identifier + ".png";
File outputfile = new File(blackAndWhiteImage);
ImageIO.write(blackNWhite, "png", outputfile);
ITesseract instance = new Tesseract();
// Point to one folder above tessdata directory, must contain training data
instance.setDatapath("/usr/share/tesseract-ocr/");
// ISO 693-3 standard
instance.setLanguage("deu");
String result = instance.doOCR(outputfile);
result = result.replaceAll("[^a-zA-Z0-9öÖäÄüÜß@\\s]", "");
Files.delete(new File(blackAndWhiteImage).toPath());
GroupAttachments groupAttachments = this.groupAttachmentsDAO.getAttachmenById(attachId);
System.out.println("OCR Result is "+result);
if (groupAttachments != null) {
saveIndexes(result, groupAttachments.getFileName(), null, groupAttachments.getGroupId(), false, attachId);
}
return result;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
Thank you.
Upvotes: 3
Views: 823
Reputation: 31
Try upgrading to tess4j version 3.4.1. That solved the issue for me.
Upvotes: 0
Reputation: 13380
Due to a bug in Java Image IO (which was fixed with Java 9), the current version of Java Tesseract Wrapper (3.4.0 as this answer was written) does not work with < Java 9. To work with lower Java versions, you can try the following fix to Tesseract ImageIOHelper class. Simply make a copy of the class in your project and apply the necessary changes and it will work with both files and BufferedImages smoothly.
Note: This version does not use the Tiff optimization used in the original class, you can add it if it is necessary for your project.
public static ByteBuffer getImageByteBuffer(RenderedImage image) throws IOException {
//Set up the writeParam
if (image instanceof BufferedImage) {
return convertImageData((BufferedImage) image);
}
ColorModel cm = image.getColorModel();
int width = image.getWidth();
int height = image.getHeight();
WritableRaster raster = cm
.createCompatibleWritableRaster(width, height);
boolean isAlphaPremultiplied = cm.isAlphaPremultiplied();
Hashtable properties = new Hashtable();
String[] keys = image.getPropertyNames();
if (keys != null) {
for (int i = 0; i < keys.length; i++) {
properties.put(keys[i], image.getProperty(keys[i]));
}
}
BufferedImage result = new BufferedImage(cm, raster,
isAlphaPremultiplied, properties);
image.copyData(raster);
return convertImageData(result);
}
Upvotes: 4