Reputation: 441
This script will sort the files by date then move the first 2500 files to another directory.
When I run it, the system prints out Argument list too long
.
NUM_OF_FILES=2500
FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load
if [ ! -d $DESTINATION_DIRECTORY ]
then
echo "unused_file directory does not exist!"
mkdir $DESTINATION_DIRECTORY
echo "$DESTINATION_DIRECTORY directory created!"
else
echo "$DESTINATION_DIRECTORY exist!"
fi
echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory"
ls -tr $FROM_DIRECTORY/MSCERC*.Z | head -$NUM_OF_FILES |
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
Upvotes: 9
Views: 4728
Reputation: 1724
I landed here with the same issue. There are other posts worth reading as a background
The /bin/ls
process is hitting the system ARG_MAX
limit while glob-expanding PATTERN*
before the head
or xargs
is even run.
The issue is with the filtering and sorting-by-date done by /bin/ls -tr PATTERN*
, here is a bash snippet to do the filtering and sorting using find
and a while
loop:
find . -mindepth 0 -maxdepth 1 -print0 |
while IFS= read -r -d '' fname; do
[[ $fname =~ PATTERN* ]] || continue
modsecs="$(stat -c %Y $fname)"
echo "${modsecs},${fname}"
done |
sort -n |
head -2500 |
cut -f2- -d,
Placing that in a function and invoking from the command line, and using quotes properly to handle files with spaces, shows the final solution
# get newest 'nfiles' files matching 'pattern'
# in the current directory
get_newest_files_by_patt() {
local nfiles pattern fname modsecs
nfiles="$1"
pattern="$2"
find . -mindepth 0 -maxdepth 1 -print0 |
while IFS= read -r -d '' fname; do
# note: no quotes around ${pattern}
[[ $fname =~ ${pattern} ]] || continue
modsecs="$(stat -c %Y "${fname}")"
echo "${modsecs},${fname}"
done |
sort -n | head -"${topn}" | cut -f2- -d,
}
# capture output in an array - handles spaces in filenames
cd "${FROM_DIRECTORY}"
mapfile topn_files < <(get_topN_files_by_patt "${NUM_OF_FILES}" "MSCERC*.Z")
# move the files
/bin/mv "${topn_files[@]}" "${DESTINATION_DIRECTORY}"
Uses find . -mindepth 0 -maxdepth 1
to get the unfiltered directory contents
Originally this solution parsed /bin/ls
to remain closer to the OP problem statement, but as @Charles Duffy pointed out, parsing /bin/ls output is problematic for comparing metadata of files
Uses find . . . -print0 paired with while IFS= read -r -d ''
to avoid problems with special characters in file names.
Filters by PATTERN*
using bash's =~
regular expression matching operator
Injects a numerically sortable modsecs
field with a comma on each line
stat -c %Y
(modification time in seconds since the Epoch)Sorts the output of the while
loop numerically by prepended modsecs
Selects the top 2500 with head -2500
from the sorted output
Finally removes modsecs
up to the comma with cut
Here's an illustration of the issue 1) create 100k x*
files and 100k y*
files, /bin/ls *
hits the ARG_MAX
limit, but /bin/ls x*
does not (on my machine)
$ getconf ARG_MAX
2097152
$ mkdir /tmp/d; cd /tmp/d; for i in `seq 100000`; do touch x$i y$i; done
. . .
$ /bin/ls -tr * | wc -l
-bash: /bin/ls: Argument list too long
$ /bin/ls -tr x* | wc -l
100000
As @Gowtham pointed out, "if you have too many files that match" then the /bin/ls -tr
of globbed files MSCERC*.Z
is too big, it overflows the ARG_MAX
limit on your machine. Technically it is not the number of files but the number of characters (the bigger the files the fewer it will take to hit the limit).
Upvotes: 1
Reputation: 36422
Change
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
do something like the following:
find "$FROM_DIRECTORY" -maxdepth 1 -type f -name 'MSCERC*.Z' -printf '%p\t%T@\n' | sort -k2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs mv -t "$DESTINATION_DIRECTORY"
This uses find to create a list of files with modification timestamps, sorts by the timestamp, then removes the unneeded field before passing the output to head
and xargs
EDIT
Another variant, should work with non GNU utils
find "$FROM_DIRECTORY" -type f -name 'MSCERC*.Z' -printf '%p\t%T@' |sort -k 2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs -i mv \{\} "$DESTINATION_DIRECTORY"
Upvotes: 2
Reputation: 12592
First of create a backup list of the files to be treated. Then read the backup file line-by-line and heal it. For example
#!/bin/bash
NUM_OF_FILES=2500
FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load
if [ ! -d $DESTINATION_DIRECTORY ]
then
echo "unused_file directory does not exist!"
mkdir $DESTINATION_DIRECTORY
echo "$DESTINATION_DIRECTORY directory created!"
else
echo "$DESTINATION_DIRECTORY exist!"
fi
echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory"
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 > list
exec 3<list
while read file <&3
do
mv $file $DESTINATION_DIRECTORY
done
Upvotes: 1
Reputation:
You didn't say, but I assume this is where the problem occurs:
ls -tr $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
(You can verify it by adding "set -x" to the top of your script.)
The problem is that the kernel has a fixed maximum size of the total length of the command line given to a new process, and your exceeding that in the ls
command. You can work around it by not using globbing and instead using grep
:
ls -tr $FROM_DIRECTORY/ | grep '/MSCERC\*\.Z$' |head -2500 | \
xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
(grep
uses regular expressions instead of globs, so the pattern looks a little bit different.)
Upvotes: 3
Reputation: 1485
A quick way to fix this would be to change to $FROM_DIRECTORY, so that you can refer the files using (shorter) relative paths.
cd $FROM_DIRECTORY &&
ls -tr MSCERC*.Z|head -2500 |xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"
This is also not entirely fool-proof, if you have too many files that match.
Upvotes: 1