Richard
Richard

Reputation: 14625

Manipulate bitmap for best ocr detection

I'm using Tesseract ORC library to extract text from images taken on screens. Problem is that most modern cameras also captures the pixel on a display while taking a photo.

Is there anyway to apply like a filter or threasholding to the bitmap to "extract" the text to a clearer one for better results with tesseract?

Se example, before processing: enter image description here

After processing (threshold effect in photoshop): enter image description here

Upvotes: 3

Views: 841

Answers (1)

Geobits
Geobits

Reputation: 22342

Tesseract has a built-in threshold method, TessBaseAPI#ThresholdRect. Have you tried that? If so, what problems did you have with it?

If it didn't work so well on some pictures, you may want to try looking up some "mean" or "adaptive" threshold algorithms, since it looks like Tesseract's is a straight threshold, so it may not adapt well to darker/lighter images without some tweaking.

Upvotes: 2

Related Questions