Franko
Franko

Reputation: 131

Clearing archive files with linux bash script

Here is my problem,

I have a folder where is stored multiple files with a specific format:

Name_of_file.TypeMM-DD-YYYY-HH:MM

where MM-DD-YYYY-HH:MM is the time of its creation. There could be multiple files with the same name but not the same time of course.

What i want is a script that can keep the 3 newest version of each file.

So, I found one example there: Deleting oldest files with shell

But I don't want to delete a number of files but to keep a certain number of newer files. Is there a way to get that find command, parse in the Name_of_file and keep the 3 newest???

Here is the code I've tried yet, but it's not exactly what I need.

find /the/folder -type f -name 'Name_of_file.Type*' -mtime +3 -delete

Thanks for help!


So i decided to add my final solution in case anyone liked to get it. It's a combination of the 2 solutions given.

ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}" | awk 'NR > 3' | xargs rm

One line, super efficiant. If anything changes on the pattern of date or name just change the grep -P pattern to match it. This way you are sure that only the files fitting this pattern will get deleted.

Upvotes: 3

Views: 1753

Answers (3)

glenn jackman
glenn jackman

Reputation: 246807

This pipeline will get you the 3 newest files (by modification time) in the current dir

stat -c $'%Y\t%n' file* | sort -n | tail -3 | cut -f 2-

To get all but the 3 newest:

stat -c $'%Y\t%n' file* | sort -rn | tail -n +4 | cut -f 2-

Upvotes: 1

David W.
David W.

Reputation: 107040

Can you be extra, extra sure that the timestamp on the file is the exact same timestamp on the file name? If they're off a bit, do you care?

The ls command can sort files by timestamp order. You could do something like this:

$ ls -t | awk 'NR > 3' | xargs rm
  • THe ls -t lists the files by modification time where the newest are first.
  • The `awk 'NR > 3' prints out the list of files except for the first three lines which are the three newest.
  • The xargs rm will remove the files that are older than the first three.

Now, this isn't the exact solution. There are possible problems with xargs because file names might contain weird characters or whitespace. If you can guarantee that's not the case, this should be okay.

Also, you probably want to group the files by name, and keep the last three. Hmm...

ls | sed 's/MM-DD-YYYY-HH:MM*$//' | sort -u | while read file
do
    ls -t $file* | awk 'NR > 3' | xargs rm
done

The ls will list all of the files in the directory. The sed 's/\MM-DD-YYYY-HH:MM//' will remove the date time stamp from the files. Thesort -u` will make sure you only have the unique file names. Thus

file1.txt-01-12-1950
file2.txt-02-12-1978
file2.txt-03-12-1991

Will be reduced to just:

file1.txt
file2.txt

These are placed through the loop, and the ls $file* will list all of the files that start with the file name and suffix, but will pipe that to awk which will strip out the newest three, and pipe that to xargs rm that will delete all but the newest three.

Upvotes: 1

Shawn Chin
Shawn Chin

Reputation: 86864

Assuming we're using the date in the filename to date the archive file, and that is possible to change the date format to YYYY-MM-DD-HH:MM (as established in comments above), here's a quick and dirty shell script to keep the newest 3 versions of each file within the present working directory:

#!/bin/bash
KEEP=3  # number of versions to keep

while read FNAME; do
    NODATE=${FNAME:0:-16}  # get filename without the date (remove last 16 chars)
    if [ "$NODATE" != "$LASTSEEN" ]; then  # new file found
        FOUND=1; LASTSEEN="$NODATE"
    else  # same file, different date
        let FOUND="FOUND + 1"
        if [ $FOUND -gt $KEEP ]; then
            echo "- Deleting older file: $FNAME"
            rm "$FNAME"
        fi
    fi
done < <(\ls -r | grep -P "(.+)\d{4}-\d{2}-\d{2}-\d{2}:\d{2}")

Example run:

[me@home]$ ls
another_file.txt2011-02-11-08:05  
another_file.txt2012-12-09-23:13  
delete_old.sh
not_an_archive.jpg 
some_file.exe2011-12-12-12:11             
some_file.exe2012-01-11-23:11 
some_file.exe2012-12-10-00:11  
some_file.exe2013-03-01-23:11  
some_file.exe2013-03-01-23:12

[me@home]$ ./delete_old.sh 
- Deleting older file: some_file.exe2012-01-11-23:11
- Deleting older file: some_file.exe2011-12-12-12:11

[me@home]$ ls
another_file.txt2011-02-11-08:05
another_file.txt2012-12-09-23:13
delete_old.sh
not_an_archive.jpg
some_file.exe2012-12-10-00:11
some_file.exe2013-03-01-23:11
some_file.exe2013-03-01-23:12

Essentially, but changing the file name to dates in the form to YYYY-MM-DD-HH:MM, a normal string sort (such as that done by ls) will automatically group similar files together sorted by date-time.

The ls -r on the last line simply lists all files within the current working directly print the results in reverse order so newer archive files appear first.

We pass the output through grep to extract only files that are in the correct format.

The output of that command combination is then looped through (see the while loop) and we can simply start deleting after 3 occurrences of the same filename (minus the date portion).

Upvotes: 1

Related Questions