Reputation: 9
I have a EBCDIC file from which i extracted images. However, there is some data on the images which is key source in identifying my transactions. Assume that i have an image as "stackoverflow logo" stored under name "img1.jpg" on my desktop and when i read it using the following code, it works
String inputImage = "C:\\Desktop\\img1.jpg";
File imageFile = new File(inputImage);
BufferedImage image1 = ImageIO.read(imageFile);
System.out.println(image1);
However, when i attempt the same with an image decoded from EBCDIC conversion, it returns null.
The difference i observed is that there is no color associated in the decoded image. Is there any way to read these images and retrieve the text on the image. Following is not the exact image which i am working on, but just to give an idea i am sharing a sample from internet.
Note: The image am working on looks like a Scanned image (Grayscale)
Example:
Also, I observed that if i open the decode file and do a screen capture via snipping tool and store it as jpg file (which already is jpg) and read it, system is reading that file. not sure where is the issue, is it compression or color coding or format.
Upvotes: 0
Views: 189
Reputation: 9
Thank you everyone. I used Tess4j to decode the TIFF image. Unfortunately the information i was looking for isn't available in the decoded text. But, done with the POC. used the following library and added eng.traineddata in the folder where images exist
import net.sourceforge.tess4j.*;
String inputImage = "C:\\Desktop\\img1.tiff";
File imageFile = new File(inputImage);
ITesseract imageRead = new Tesseract();
imageRead.setDataPath("C:\\Desktop\\");
imageRead.setLanguage("eng");
String imageText = imageRead.doOCR(imageFile);
System.out.println(imageText);
Upvotes: 0