Reputation: 1
I have scans from a clip art book; each page was scanned to tiff; each tiff has approximately 18-20 clip art images—how would I automate the selection and extraction of each of these 18-20 images, retain color depth/ppi, and save each clip art image as its own image file.
Linking to a version of what I’m describing—ideally would take example image and dump each clip art image in file to individual files. Ideally would process image/directory with minimal user interaction. Happy to use command line, gui, whatever…macOS,Linux, Windows all fine.
Thanks for any ideas on how to approach—
Probably underthinking this—wondered if Photoshop actions or a Google Cloud Vision process might work…thought of tensorflow…or some method to ID image boundaries within a page/file, use coordinates to kick out each clip art image, but just stalled at the start. Surely this is something in the CV arsenal, I think I’m just lacking the knowledge of libraries/modules/existing tools and vocabulary to get started. Couldn’t find anything imagemagick-related. Don’t want hand select/copy/paste every page.
Upvotes: 0
Views: 714
Reputation: 53081
In Imagemagick, you can do that using connected components processing. But it is very dependent upon getting a good threshold to separate your objects from the background. Note that jpg is not a good format. Background color is not uniform and has compression artifacts especially near the objects
What I do is:
Convert to gray and threshold and negate so the objects are white on a black background.
Then I do the connected components processing on the binary image merging objects that are smaller than 5000 pixels in area into their surroundings. This helps mitigate holes and throws out smaller objects that arise from noise and compression. I then save the bounding box and the centroid for all object found.
I then do a for loop over each object found. I retrieve the bounding box and crop the input original image and save it.
I also use flood fill and +opaque to make everything not the main object in the binary image black and the object white.
I crop the processed binary image at the same bounding box and put it into the alpha channel.
Then I flatten the image over white so that the background becomes white and save the masked result.
cd
cd desktop/clipart_separate
OLDIFS=$IFS
IFS=$'\n'
dataArr=(`convert clipart.jpeg \
-colorspace gray \
-threshold 73% \
-negate \
-type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-define connected-components:area-threshold=5000 \
-connected-components 8 tmp.png | \
grep "gray(255)" | awk '{print $2, $3}'`)
num=${#dataArr[*]}
for ((i=0; i<num; i++)); do
bbox=`echo ${dataArr[$i]} | cut -d\ -f1`
centroid=`echo ${dataArr[$i]} | cut -d\ -f2`
convert clipart.jpeg -crop $bbox +repage clipart_$i.jpg
convert -quiet clipart_$i.jpg \
\( tmp.png -fill red -draw "color $centroid floodfill" -alpha off \
-crop $bbox +repage \
-fill black +opaque red -fill white +opaque black \) \
-alpha off -compose copy_opacity -composite \
-compose over -background white -flatten \
clipart_masked_$i.jpg
done
IFS=$OLDIFS
Upvotes: 2