Reputation: 3488
I'm using tesseract OCR for text recognizing of video frames.
I wrote a program that use ffmpeg
to get all the main frames of the video, and crop them (with static values) for center the text (it clould be, for example, subtitles).
I also use ImageMagick
and TextCleaner
scripts and they improve OCR's quality like a magic!
Anyway, sometimes I've got not so good video quality, or video size is a bit small, or crop not center text (becaus values are staitc), and OCR results are very bad.
My question is: How can I detect right position of the text in the for a perfetct crop? This should improve quality of OCR and get better results.
Any suggestions would be greatly appreciated. Thanks.
Upvotes: 1
Views: 1627
Reputation: 90263
You could try to play with edge detection, and maybe combine with your other methods. Like this (purely edge detecting):
convert \
big.jpg \
\( \
big.jpg -colorspace gray -edge 8 -negate \
\) \
+append \
-resize 50% \
big-edge-8.png
or:
convert \
big.jpg \
\( \
big.jpg -colorspace gray -edge 25 -negate \
\) \
+append \
-resize 50% \
big-edge-25.png
Here are the two results:
Another option is to reduce the number of colors, apply contrast-stretching and (optionally) a threshold:
convert \
big.jpg \
-colors 400 \
-contrast-stretch 25% \
colors-400-contraststretch-25.png
convert \
big.jpg \
-colors 400 \
-contrast-stretch 25% \
-threshold 50% \
colors-400-contraststretch-25-threshold-50.png
You may want to also play with -canny
. It implents the 'canny' edge detection algorithm and is present in ImageMagick since version 6.8.9-0. Combine it with -contrast-stretch
and -colorspace gray
:
convert big.jpg \
-colorspace gray \
-contrast-stretch 45% \
-canny 0x1+10%+30% \
canny1.png
convert big.jpg \
-colorspace gray \
-contrast-stretch 45% \
-canny 0x2+10%+30% \
canny2.png
Upvotes: 1