Nelson Francisco Ruiz
Nelson Francisco Ruiz

Reputation: 75

How to make Tesseract OCR output words in a sentence form?

Sample Image Input

The result I get is like this: https://i.sstatic.net/dM0qG.png

Is it possible to make Tesseract give an output in sentence/paragraph form like this?

This is to certify that you have successfully PASSED the PHIL-IT General Certification Examination held on January 26, 2015 at the Cebu Institute of Technology - University, N. Bacalso Avenue, Cebu City 6000 Philippines.

Upvotes: 2

Views: 1555

Answers (1)

Xavier Peña
Xavier Peña

Reputation: 7899

Since result is a List of Tessnet2.Word, and the text of each Word it is stored in its item.Text, you can:

  1. Create a list with only the words (not the full Tessnet2.Word object)
  2. Join this list, using "space" as the separator

So let's say your results are stored in a var named result (you performed the operation var result = ocr.DoOCR(image, null);). If you combine both steps, it looks like this:

string phrase = string.Join(" ", result.Select(x => x.Text).ToList());

The result is:

This is to certify that you have successfully PASSED the Phil-lT General Certification Examination held on [nnuag 26, 2015 at the Cebu Institute uf Tedmnlngy · University , N. Bacalso Avenue, Cebu City 6000 Philippines.

(it has some detection errors, but that is a different issue)

Upvotes: 2

Related Questions