Codester2020
Codester2020

Reputation: 9

Unable to read the EBCDIC 037 decoded image (Java)

I have a EBCDIC file from which i extracted images. However, there is some data on the images which is key source in identifying my transactions. Assume that i have an image as "stackoverflow logo" stored under name "img1.jpg" on my desktop and when i read it using the following code, it works

String inputImage = "C:\\Desktop\\img1.jpg";
File imageFile = new File(inputImage);
BufferedImage image1 = ImageIO.read(imageFile);
System.out.println(image1);

However, when i attempt the same with an image decoded from EBCDIC conversion, it returns null.

The difference i observed is that there is no color associated in the decoded image. Is there any way to read these images and retrieve the text on the image. Following is not the exact image which i am working on, but just to give an idea i am sharing a sample from internet. Note: The image am working on looks like a Scanned image (Grayscale) Example: enter image description here

Also, I observed that if i open the decode file and do a screen capture via snipping tool and store it as jpg file (which already is jpg) and read it, system is reading that file. not sure where is the issue, is it compression or color coding or format.

Upvotes: 0

Views: 189

Answers (1)

Codester2020
Codester2020

Reputation: 9

Thank you everyone. I used Tess4j to decode the TIFF image. Unfortunately the information i was looking for isn't available in the decoded text. But, done with the POC. used the following library and added eng.traineddata in the folder where images exist

import net.sourceforge.tess4j.*;
String inputImage = "C:\\Desktop\\img1.tiff";
File imageFile = new File(inputImage);
ITesseract imageRead = new Tesseract();
imageRead.setDataPath("C:\\Desktop\\");
imageRead.setLanguage("eng");
String imageText = imageRead.doOCR(imageFile);
System.out.println(imageText);

Upvotes: 0

Related Questions