Reputation: 91
For example I have following files in a directory
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
I want to keep FILE_1_2021-01-03.csum
and FILE_1_2021-01-03.csv
in current directory but zip and move rest of the older files to another directory.
So far I have tried like this but stuck how to correctly identify the pairs
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
for file in ${PATH}/*
do
if [[ ! -d $file ]]
then
file_count=$(($file_count+1))
fi
done
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
// How to do it
fi
Upvotes: 0
Views: 131
Reputation: 52409
Since the timestamps are in a format that naturally sorts out with earliest first, newest last, an easy approach is to just use filename expansion to store the .csv and .csum filenames in a pair of arrays, and then do something with all but the last element of both:
declare -a csv=( FILE_*.csv ) csum=( FILE_*.csum )
mv "${csv[@]:0:${#csv[@]}-1}" "${csum[@]:0:${#csum[@]}-1}" new_directory/
(Or tar them up first, or whatever.)
Upvotes: 2
Reputation: 2544
Your algorithm of counting files can be simplified using find
. You seem to look for non-directories. The option -not -type d
does exactly that. By default find
searches into the subfolders, so you need to pass -maxdepth 1
to limit the search to a depth of 1.
find "$PATH" -maxdepth 1 -not -type d
If you want to get the number of files, you may pipe the command to wc
:
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
Now there are two ways of detecting which file is the more recent: by looking at the filename, or by looking at the date when the files were last created/modified/etc. Since your naming convention looks pretty solid, I would recommend the first option. Sorting by creation/modification date is more complex and there are numerous cases where this information is not reliable, such as copying files, zipping/unzipping them, touching files, etc.
You can sort with sort
and then grab the last element with tail -1
:
find "$PATH" -maxdepth 1 -not -type d | sort | tail -1
You can do the same thing by sorting in reverse order using sort -r
and then grab the first element with head -1
. From a functional point of view, it is strictly equivalent, but it is slightly faster because it stops at the first result instead of parsing all results. Plus it will be more relevant later on.
find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1
Once you have the filename of the most recent file, you can extract the base name in order to create a pattern out of it.
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
Let’s explain this:
most_recent_file
${most_recent_file%.*}
; the %
symbol will cut at the end, and .*
will cut everything after the last dot, including the dot itself${most_recent_file##*/}
; the ##
symbol will cut at the beginning with a greedy catch, and */
will cut everything before the last slash, including the slash itselfThe difference between #
and ##
is how greedy the pattern is. If your file is /path/to/file.csv
then ${most_recent_file#*/}
(single #
) will cut the first slash only, i.e. it will output path/to/file.csv
, while ${most_recent_file##*/}
(double #
) will cut all paths, i.e. it will output file.csv
.
Once you have this string, you can make a pattern to include/exclude similar files using find.
find "$PATH" -maxdepth 1 -not -type d -name "$most_recent_file.*"
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*"
The first line will list all files which match your pattern, and the second line will list all files which do not match the pattern.
Since you want to move your 'old' files to a folder, you may execute a mv
command for the last list.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv {} "$ARCH" \;
If your version of find
supports it, you may use +
in order to batch the move operations.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv -t "$ARCH" {} +
Otherwise you can pipe to xargs
.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
If put altogether:
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
fi
As a last note, if your path has newlines, it will not work. If you want to handle this case, you need a few modifications. Counting files would be done like this:
file_count=$(find "$PATH" -maxdepth 1 -not -type d -print . | wc -c)
Getting the most recent file:
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d -print0 | sort -rz | grep -zm1)
Moving files with xargs
:
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -print0 | xargs -0 mv -t "$ARCH"
(There’s no problem if moving files using -exec
)
I won’t go into details, but just know that the issue is known and these are the kind of solutions you can apply if need be.
Upvotes: 0
Reputation: 34484
First off ...
PATH
is a OS-level variable for keeping track of where to locate binaries but in this case ...PATH=/path/to/dir
As for the question, some assumptions:
*.csv
file has a matching *.csum
filezip
? gzip
? tar
all old files into a single .tar
and then (g)zip
?) so for the sake of this answer I'm going to just gzip
each file and move to a new directory (OP can adjust the code to fit the actual requirement)Setup:
srcdir='/tmp/myfiles'
arcdir='/tmp/archive'
rm -rf "${srcdir}" "${arcdir}"
mkdir -p "${srcdir}" "${arcdir}"
cd "${srcdir}"
touch FILE_1_2021-01-0{1..3}.{csum,csv} abc XYZ
ls -1
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
Get list of *.csum/*.csv
files and sort in reverse order:
$ find . -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r
/tmp/myfiles/FILE_1_2021-01-03.csv
/tmp/myfiles/FILE_1_2021-01-03.csum
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Eliminate first 2 files (ie, generate list of files to zip/move):
$ find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Process our list of files:
while read -r fname
do
gzip "${fname}"
mv "${fname}".gz "${arcdir}"
done < <(find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3)
NOTE: the find|sort|tail
results could be piped to xargs
(or parallel
) to perform the gzip/mv
operations but without more details on what OP means by 'zip and move' I've opted for a simpler, albeit less performant, while
loop
Results:
$ ls -1 "${srcdir}"
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
$ ls -1 "${arcdir}"
FILE_1_2021-01-01.csum.gz
FILE_1_2021-01-01.csv.gz
FILE_1_2021-01-02.csum.gz
FILE_1_2021-01-02.csv.gz
Upvotes: 0