Reputation:
I have a backup tool that takes database backup daily and stores them with the following format:
*_DATE_*.*.sql.gz
with DATE
being in YYYY-MM-DD
format.
How could I delete old files (by comparing YYYY-MM-DD
in the filenames) matching the pattern above, while leaving only the newest one.
Example:
wordpress_2020-01-27_06h25m.Monday.sql.gz
wordpress_2020-01-28_06h25m.Tuesday.sql.gz
wordpress_2020-01-29_06h25m.Wednesday.sql.gz
Ath the end only the last file, meaning wordpress_2020-01-29_06h25m.Wednesday.sql.gz
should remain.
Upvotes: 0
Views: 1022
Reputation: 1
You can use my Python script "rotate-archives" for smart delete backups. (https://gitlab.com/k11a/rotate-archives).
An example of starting archives deletion:
rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=7-5,31-14,365-180-5 archives_dir=/mnt/archives
As a result, there will remain archives from 7 to 30 days old with a time interval between archives of 5 days, from 31 to 364 days old with time interval between archives 14 days, from 365 days old with time interval between archives 180 days and the number of 5.
But require move _date_
to beginning file name or script add current date for new files.
Upvotes: 0
Reputation: 7791
Using two for loop
#!/bin/bash
shopt -s nullglob ##: This might not be needed but just in case
##: If there are no files the glob will not expand
latest=
allfiles=()
unwantedfiles=()
for file in *_????-??-??_*.sql.gz; do
if [[ $file =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]]; then
allfiles+=("$file")
[[ $file > $latest ]] && latest=$file ##: The > is magical inside [[
fi
done
n=${#allfiles[@]}
if ((n <= 1)); then ##: No files or only one file don't remove it!!
printf '%s\n' "Found ${n:-0} ${allfiles[@]:-*sql.gz} file, bye!"
exit 0 ##: Exit gracefully instead
fi
for f in "${allfiles[@]}"; do
[[ $latest == $f ]] && continue ##: Skip the latest file in the loop.
unwantedfiles+=("$f") ##: Save all files in an array without the latest.
done
printf 'Deleting the following files: %s\n' "${unwantedfiles[*]}"
echo rm -rf "${unwantedfiles[@]}"
Relies heavily on the >
test operator inside [[
You can create a new file with lower dates and should still be good.
The echo is there just to see what's going to happen. Remove it if you're satisfied with the output.
I'm actually using this script via cron now, except for the *.sql.gz
part since I only have directories to match but the same date formant so I have, ????-??-??/
and only ([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})
as the regex pattern.
Upvotes: 0
Reputation: 22022
Assuming:
_DATE_
portion does not contain underscores.Then would you try the following:
for f in *.sql.gz; do
echo "$f"
done | sort -t "_" -k 2 | head -n -1 | xargs rm --
If your head
and cut
commands support -z
option, following code will be more robust against special characters in the filenames:
for f in *.sql.gz; do
[[ $f =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]] && \
printf "%s\t%s\0" "${BASH_REMATCH[1]}" "$f"
done | sort -z | head -z -n -1 | cut -z -f 2- | xargs -0 rm --
NUL
character as a line delimiter and allows any special characters in the filenames.DATE
portion from the filename, then prepend it to the filename as a first field separated by a tab character.DATE
string, exclude the last (newest) one, then retrieve the filename cutting the first field off, then remove those files.Upvotes: 2
Reputation: 322
Goto the folder where you have *_DATE_*.*.sql.gz
files and try below command
ls -ltr *.sql.gz|awk '{print $9}'|awk '/2020/{print $0}' |xargs rm
or
use
`ls -ltr |grep '2019-05-20'|awk '{print $9}'|xargs rm`
replace/2020/
with the pattern you want to delete. example 2020-05-01
replace as /2020-05-01/
Upvotes: 0
Reputation: 26481
Since the pattern (glob) you present us is very generic, we have to make an assumption here.
assumption: the date pattern, is the first sequence that matches the regex
[0-9]{4}-[0-9]{2}-[0-9]{2}
Files are of the form: constant_string_<DATE>_*.sql.gz
a=( *.sql.gz )
unset a[${#a[@]}-1]
rm "${a[@]}"
Files are of the form: *_<DATE>_*.sql.gz
Using this, it is easily done in the following way:
a=( *.sql.gz );
cnt=0; ref="0000-00-00"; for f in "${a[@]}"; do
[[ "$f" =~ [0-9]{4}(-[0-9]{2}){2} ]] \
&& [[ "$BASH_REMATCH" > "$ref" ]] \
&& ref="${BASH_REMATCH}" && refi=$cnt
((++cnt))
done
unset a[cnt]
rm "${a[@]}"
[[ expression ]]
<snip>
An additional binary operator,=~
, is available, with the same precedence as==
and!=
. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). The return value is0
if the string matches the pattern, and1
otherwise. If the regular expression is syntactically incorrect, the conditional expression's return value is2
. If the shell optionnocasematch
is enabled, the match is performed without regard to the case of alphabetic characters. Any part of the pattern may be quoted to force it to be matched as a string. Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variableBASH_REMATCH
. The element ofBASH_REMATCH
with index0
is the portion of the string matching the entire regular expression. The element ofBASH_REMATCH
with indexn
is the portion of the string matching the nth parenthesized subexpressionsource:
man bash
Upvotes: 0
Reputation:
I found this in another question. Although it serves the purpose, but it does not handle the files based on their filenames.
ls -tp | grep -v '/$' | tail -n +2 | xargs -I {} rm -- {}
Upvotes: 0