Reputation: 1686
I do have a third-party program, which uploads files to a webserver. These files are images, in different folders and with different names. Those files get references into a database. The program imports new images and upload those to those folders. If there is an existing file, it just takes the name and add a special counter, create a new reference in the database and the old one will be removed. But instead of removing the file as well, it keeps a copy.
Lets say, we have a image-file name "109101.jpg". There is a new version of the file and it will be uploaded with the filename: "109101_1.jpg". This goes further till "109101_103.jpg" for example. Now, all the 103 files before this one are outdated and could be deleted.
Due to the fact, that the program is not editable and third-party, I am not able to change that behavior. Instead, I need a Shell script, which walks through those folders and deletes all the images before the latest one. So only "109101_103.jpg" will survive and all the others before this number will be deleted. And as a side effect, there are also images, with a double underscored name (only these, no tripple ones or so). For example: "109013_35_1.jpg" is the original one, the next one is "109013_35_1_1.jpg" and now its at "109013_35_1_24.jpg". So only "109013_35_1_24.jpg" has to survive.
Right now I am not even having an idea, how to solve this problem. Any ideas?
Upvotes: 1
Views: 921
Reputation: 10717
Here's a one line pipeline, because I felt like it. Shown with newlines inserted because I'm not evil.
for F in $(find . -iname '*.jpg' -exec basename {} .jpg \;
| sed -r -e 's/^([^_]+|[^_]+_[^_]+_[^_]+)_[0-9]+$/\1/'
| sort -u); do
find -regex ".*${F}_[0-9]*.jpg"
| sort -t _ -k 2 -n | sort -n -t _ -k 4 -s | head -n -1;
done
Upvotes: 1
Reputation: 241838
The following script deletes the files in a given directory:
#! /bin/bash
cd $1
shopt -s extglob # Turn on extended patterns.
shopt -s nullglob # Non matched pattern expands to null.
delete=()
for file in +([^_])_+([0-9]).jpg \
+([^_])_+([0-9])_+([0-9])_+([0-9]).jpg ; do # Only loop over non original files.
[[ $file ]] || continue # No files in the directory.
base=${file%_*} # Delete everything after the last _.
num=${file##*_} # Delete everything before the last _.
num=${num%.jpg} # Delete the extension.
[[ -f $base.jpg ]] && rm $base.jpg # Delete the original file.
[[ -f "$base"_$((num+1)).jpg ]] && delete+=($file) # The file itself is scheduled for deletion.
done
(( ${#delete[@]} )) && rm "${delete[@]}"
The numbered files are not deleted immediately, because that could remove a "following" file for another file. They are just remembered in an array and deleted at the end.
To apply the script recursively, you can run
find /top/directory -type d -exec script.sh {} \;
Upvotes: 0