Reputation: 3488

Tesseract OCR: get coords of text and improve quality of a video frame for final OCR

I'm using tesseract OCR for text recognizing of video frames.

I wrote a program that use ffmpeg to get all the main frames of the video, and crop them (with static values) for center the text (it clould be, for example, subtitles).

I also use ImageMagick and TextCleaner scripts and they improve OCR's quality like a magic!

Anyway, sometimes I've got not so good video quality, or video size is a bit small, or crop not center text (becaus values are staitc), and OCR results are very bad.

My question is: How can I detect right position of the text in the for a perfetct crop? This should improve quality of OCR and get better results.

Any suggestions would be greatly appreciated. Thanks.

Upvotes: 1

Answers (1)

Kurt Pfeifle

Reputation: 90263

You could try to play with edge detection, and maybe combine with your other methods. Like this (purely edge detecting):

convert            \
  big.jpg          \
  \(               \
      big.jpg -colorspace gray -edge 8 -negate \
  \)               \
 +append           \
 -resize 50%       \
  big-edge-8.png

or:

convert            \
  big.jpg          \
  \(               \
      big.jpg -colorspace gray -edge 25 -negate \
  \)               \
 +append           \
 -resize 50%       \
  big-edge-25.png

Here are the two results:

Another option is to reduce the number of colors, apply contrast-stretching and (optionally) a threshold:

convert                 \
   big.jpg              \
  -colors 400           \
  -contrast-stretch 25% \
   colors-400-contraststretch-25.png

convert                 \
   big.jpg              \
  -colors 400           \
  -contrast-stretch 25% \
  -threshold 50%        \
   colors-400-contraststretch-25-threshold-50.png

colors-400-contraststretch-25+threshold-50.png

Update: 'canny' edge detection

You may want to also play with -canny. It implents the 'canny' edge detection algorithm and is present in ImageMagick since version 6.8.9-0. Combine it with -contrast-stretch and -colorspace gray:

convert big.jpg         \
  -colorspace gray      \
  -contrast-stretch 45% \
  -canny 0x1+10%+30%    \
   canny1.png

convert big.jpg         \
  -colorspace gray      \
  -contrast-stretch 45% \
  -canny 0x2+10%+30%    \
   canny2.png

canny1+2.png

Upvotes: 1

Tesseract OCR: get coords of text and improve quality of a video frame for final OCR

Answers (1)

Update: 'canny' edge detection

Related Questions