Reputation: 367
I'm trying to develop an app that can read text from image. I have to clean the background of image. I heard that fred's imagemagick textcleaner script can be use but i don't know how to use it. Does anyone has any idea about it?
Input Image :
Upvotes: 2
Views: 1494
Reputation: 207748
I had a try at this and while the news is not good, it's still an answer, even if negative. Maybe someone else wants to take my efforts further, or maybe you feel my efforts confirm that textcleaner
is not the way to go. Anyway, I took your image and wrote a script to vary the most promising parameters of Fred Weinhaus's textcleaner
. I feel that the ones that may help are -f
, -o
and -t
, and I varied these through their likely ranges like this:
#!/bin/bash
for f in 1 5 10 15 20 25; do
for o in 1 3 6 9 12; do
for t in 1 25 50 75 100; do
./textcleaner -f $f -o $o -t $t cc.jpg z_${f}_${o}_${t}.png
convert -label "f=$f, o=$o, t=$t" z_${f}_${o}_${t}.png miff:-
done
done
done | montage - -frame 5 -tile 6x montage.png
That gives me this montage of all the results
To my eye, the most promising was maybe f=10, o=1, t=1
I then thought "why bother seeing what I like, let's see what Tesseract likes?". So I changed the script to this so that Tesseract
got to look at all the permutations:
#!/bin/bash
for f in 1 5 10 15 20 25; do
for o in 1 3 6 9 12; do
for t in 1 25 50 75 100; do
./textcleaner -f $f -o $o -t $t cc.jpg z_${f}_${o}_${t}.png
tesseract z_${f}_${o}_${t}.png res > /dev/null 2>&1
if grep "[0-9]" res* ; then echo z_${f}_${o}_${t}.png ;fi
done
done
done
And the results were abysmal... here is the output
um 0-" V _
L"“1}- H
z_5_3_50.png
:1:J£‘u “
z_15_3_75.png
”':{E]!) /3: '55‘
z_15_6_75.png
E2?
z_15_9_1.png
:1:
z_15_12_100.png
I -.352}: "H ,1 5
z_20_12_25.png
1/
, ,5». 3».
z_25_6_75.png
3
z_25_9_25.png
- ::'§—:am I-:L’5‘:*‘f§~f.’i'7""“-‘-"I 5="
z_25_12_1.png
7 3:2‘
z_25_12_75.png
Nothing even remotely useful. Maybe someone else has a better idea about how to tune it and which parameters to tweak, but I suspect that textcleaner
may be the wrong approach here.
Upvotes: 8
Reputation: 2903
Without seeing your data first it's hard to guess. If you have fairly uniform background you can use adaptive thresholding to remove the background.
Here are some theoretical informations on how to use adaptive thresholding. This algorithm is implemented in OpenCV.
Upvotes: 0