Rajasankar
Rajasankar

Reputation: 928

How to get text in png file using Java

I want to check if particular string is present in the image. Is that possible? Is pngj can do that?

My file will contain a graph and some legends. I want to check the if the legends are correct.

Upvotes: 3

Views: 3590

Answers (5)

zamming
zamming

Reputation: 19

You can use Tesserat.sample code below:-

String src = "path of File";
String ocrString = "";
Tesseract instance = new Tesseract();         instance.setDatapath("path of tessdata\\Tess4J\\tessdata");
BufferedImage bufferedImage = ImageIO.read(new File(src));
ocrString = instance.doOCR(new File(src));

Upvotes: 0

Haimei
Haimei

Reputation: 13015

Here I use scala to give out my solution. If you are java developer, it is quite easy for you to convert the scala code to your java code.

Step1: in build.sbt to add one more line

libraryDependencies += "com.asprise.ocr" % "java-ocr-api" % "[15,)"

Step2: import library

import com.asprise.ocr.Ocr

Step2: scala code.Please note: here is a File type. If you only have fileName/filePath, you need to use new File() to convert it.

try {
      // Image
      Ocr.setUp()
      val ocr = new Ocr
      ocr.startEngine("eng", Ocr.SPEED_FASTEST)
      val files = List(<your_file>)
      val outputString = ocr.recognize(files.toArray, Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT)
      ocr.stopEngine()
      Some(outputString)
} catch {
      case e: Exception => None // todo: to support multiple file types
}

I also write a blog to give more details info about how to extract text/content from another file(pdf, html, image, etc)

If you want to read more about this java-ocr-api, you can read its official website here.

Upvotes: 1

Pratik
Pratik

Reputation: 11745

You can try Asprise OCR out. It's a good OCR API available in Java.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 839154

No, you can't do that with pngj. The text that is visible in the PNG image is not internally stored as text. You will need OCR software if you wish to identify the text.

However it would be much better if you could get the data in another format that is easier to parse by a computer.

Upvotes: 4

Lukasz
Lukasz

Reputation: 7662

Yes, it seems to be possible. However, you should find a good OCR library. And then, assuming that your OCR library returned proper results you need to verify somehow if your legends are placed in proper positions.

Upvotes: 1

Related Questions