Keenora Fluffball
Keenora Fluffball

Reputation: 1686

Shell Script to delete specific image-files recursively

I do have a third-party program, which uploads files to a webserver. These files are images, in different folders and with different names. Those files get references into a database. The program imports new images and upload those to those folders. If there is an existing file, it just takes the name and add a special counter, create a new reference in the database and the old one will be removed. But instead of removing the file as well, it keeps a copy.

Lets say, we have a image-file name "109101.jpg". There is a new version of the file and it will be uploaded with the filename: "109101_1.jpg". This goes further till "109101_103.jpg" for example. Now, all the 103 files before this one are outdated and could be deleted.

Due to the fact, that the program is not editable and third-party, I am not able to change that behavior. Instead, I need a Shell script, which walks through those folders and deletes all the images before the latest one. So only "109101_103.jpg" will survive and all the others before this number will be deleted. And as a side effect, there are also images, with a double underscored name (only these, no tripple ones or so). For example: "109013_35_1.jpg" is the original one, the next one is "109013_35_1_1.jpg" and now its at "109013_35_1_24.jpg". So only "109013_35_1_24.jpg" has to survive.

Right now I am not even having an idea, how to solve this problem. Any ideas?

Upvotes: 1

Views: 921

Answers (2)

cha0site
cha0site

Reputation: 10717

Here's a one line pipeline, because I felt like it. Shown with newlines inserted because I'm not evil.

for F in $(find . -iname '*.jpg' -exec basename {} .jpg \;
             | sed -r -e 's/^([^_]+|[^_]+_[^_]+_[^_]+)_[0-9]+$/\1/'
             | sort -u); do
    find -regex ".*${F}_[0-9]*.jpg" 
       | sort -t _ -k 2 -n | sort -n -t _ -k 4 -s | head -n -1;
done

Upvotes: 1

choroba
choroba

Reputation: 241838

The following script deletes the files in a given directory:

#! /bin/bash
cd $1
shopt -s extglob                                       # Turn on extended patterns.
shopt -s nullglob                                      # Non matched pattern expands to null.
delete=()
for file in               +([^_])_+([0-9]).jpg \
        +([^_])_+([0-9])_+([0-9])_+([0-9]).jpg ; do    # Only loop over non original files.
    [[ $file ]] || continue                            # No files in the directory.
    base=${file%_*}                                    # Delete everything after the last _.
    num=${file##*_}                                    # Delete everything before the last _.
    num=${num%.jpg}                                    # Delete the extension.
    [[ -f $base.jpg ]] && rm $base.jpg                 # Delete the original file.
    [[ -f "$base"_$((num+1)).jpg ]] && delete+=($file) # The file itself is scheduled for deletion.
done
(( ${#delete[@]} )) && rm "${delete[@]}"

The numbered files are not deleted immediately, because that could remove a "following" file for another file. They are just remembered in an array and deleted at the end.

To apply the script recursively, you can run

find /top/directory -type d -exec script.sh {} \;

Upvotes: 0

Related Questions