Reputation:

Script to delete old files and leave the newest one in a directory in Linux

I have a backup tool that takes database backup daily and stores them with the following format:

*_DATE_*.*.sql.gz

with DATE being in YYYY-MM-DD format.

How could I delete old files (by comparing YYYY-MM-DD in the filenames) matching the pattern above, while leaving only the newest one.

Example:

wordpress_2020-01-27_06h25m.Monday.sql.gz
wordpress_2020-01-28_06h25m.Tuesday.sql.gz
wordpress_2020-01-29_06h25m.Wednesday.sql.gz

Ath the end only the last file, meaning wordpress_2020-01-29_06h25m.Wednesday.sql.gz should remain.

Upvotes: 0

Answers (6)

k11a

Reputation: 1

You can use my Python script "rotate-archives" for smart delete backups. (https://gitlab.com/k11a/rotate-archives).

An example of starting archives deletion:

rotate-archives.py test_mode=off age_from-period-amount_for_last_timeslot=7-5,31-14,365-180-5 archives_dir=/mnt/archives

As a result, there will remain archives from 7 to 30 days old with a time interval between archives of 5 days, from 31 to 364 days old with time interval between archives 14 days, from 365 days old with time interval between archives 180 days and the number of 5.

But require move _date_ to beginning file name or script add current date for new files.

Upvotes: 0

Jetchisel

Reputation: 7791

Using two for loop

#!/bin/bash
shopt -s nullglob  ##: This might not be needed but just in case
                   ##: If there are no files the glob will not expand
latest=
allfiles=()
unwantedfiles=()

for file in *_????-??-??_*.sql.gz; do
  if [[ $file =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]]; then
    allfiles+=("$file")
    [[ $file > $latest ]] && latest=$file  ##: The > is magical inside [[
  fi
done

n=${#allfiles[@]}

if ((n <= 1)); then  ##: No files or only one file don't remove it!!
  printf '%s\n' "Found ${n:-0} ${allfiles[@]:-*sql.gz} file, bye!"
  exit 0    ##: Exit gracefully instead
fi

for f in "${allfiles[@]}"; do
  [[ $latest == $f ]] && continue  ##: Skip the latest file in the loop.
  unwantedfiles+=("$f")  ##: Save all files in an array without the latest.
done

printf 'Deleting the following files: %s\n' "${unwantedfiles[*]}"

echo rm -rf "${unwantedfiles[@]}"

Relies heavily on the > test operator inside [[

You can create a new file with lower dates and should still be good.

The echo is there just to see what's going to happen. Remove it if you're satisfied with the output.

I'm actually using this script via cron now, except for the *.sql.gz part since I only have directories to match but the same date formant so I have, ????-??-??/ and only ([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}) as the regex pattern.

Upvotes: 0

tshiono

Reputation: 22022

Assuming:

The preceding substring left to _DATE_ portion does not contain underscores.
The filenames do not contain newline characters.

Then would you try the following:

for f in *.sql.gz; do
    echo "$f"
done | sort -t "_" -k 2 | head -n -1 | xargs rm --

If your head and cut commands support -z option, following code will be more robust against special characters in the filenames:

for f in *.sql.gz; do
    [[ $f =~ _([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2})_ ]] && \
        printf "%s\t%s\0" "${BASH_REMATCH[1]}" "$f"
done | sort -z | head -z -n -1 | cut -z -f 2- | xargs -0 rm --

It makes use of the NUL character as a line delimiter and allows any special characters in the filenames.
It first extracts the DATE portion from the filename, then prepend it to the filename as a first field separated by a tab character.
Then it sorts the files with the DATE string, exclude the last (newest) one, then retrieve the filename cutting the first field off, then remove those files.

Upvotes: 2

DevOpsGeek

Reputation: 322

Goto the folder where you have *_DATE_*.*.sql.gz files and try below command

ls -ltr *.sql.gz|awk '{print $9}'|awk '/2020/{print $0}' |xargs rm

use

`ls -ltr |grep '2019-05-20'|awk '{print $9}'|xargs rm`

replace/2020/ with the pattern you want to delete. example 2020-05-01 replace as /2020-05-01/

Upvotes: 0

kvantour

Reputation: 26481

Since the pattern (glob) you present us is very generic, we have to make an assumption here.

assumption: the date pattern, is the first sequence that matches the regex [0-9]{4}-[0-9]{2}-[0-9]{2}

Files are of the form: constant_string_<DATE>_*.sql.gz

a=( *.sql.gz )
unset a[${#a[@]}-1]
rm "${a[@]}"

Files are of the form: *_<DATE>_*.sql.gz

Using this, it is easily done in the following way:

a=( *.sql.gz );
cnt=0; ref="0000-00-00"; for f in "${a[@]}"; do 
   [[ "$f" =~ [0-9]{4}(-[0-9]{2}){2} ]] \
   && [[ "$BASH_REMATCH" > "$ref" ]]    \
   && ref="${BASH_REMATCH}" && refi=$cnt
   ((++cnt))
done
unset a[cnt]
rm "${a[@]}"

[[ expression ]] <snip>
An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise. If the regular expression is syntactically incorrect, the conditional expression's return value is 2. If the shell option nocasematch is enabled, the match is performed without regard to the case of alphabetic characters. Any part of the pattern may be quoted to force it to be matched as a string. Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression

_{source: man bash}

Upvotes: 0

user10990669

Reputation:

I found this in another question. Although it serves the purpose, but it does not handle the files based on their filenames.

ls -tp | grep -v '/$' | tail -n +2 | xargs -I {} rm -- {}

Upvotes: 0

Script to delete old files and leave the newest one in a directory in Linux

Answers (6)

Related Questions