user871695
user871695

Reputation: 441

Script prints "Argument list too long"

This script will sort the files by date then move the first 2500 files to another directory.
When I run it, the system prints out Argument list too long.

NUM_OF_FILES=2500
FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load

if [ ! -d $DESTINATION_DIRECTORY ]  
then  
    echo "unused_file directory does not exist!"  
    mkdir $DESTINATION_DIRECTORY   
    echo "$DESTINATION_DIRECTORY directory created!"  
else   
    echo "$DESTINATION_DIRECTORY exist!"    
fi  

echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory"  

ls -tr  $FROM_DIRECTORY/MSCERC*.Z | head -$NUM_OF_FILES |
    xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"  

Upvotes: 9

Views: 4728

Answers (5)

qneill
qneill

Reputation: 1724

I landed here with the same issue. There are other posts worth reading as a background

The /bin/ls process is hitting the system ARG_MAX limit while glob-expanding PATTERN* before the head or xargs is even run.

Solution:

The issue is with the filtering and sorting-by-date done by /bin/ls -tr PATTERN* , here is a bash snippet to do the filtering and sorting using find and a while loop:

find . -mindepth 0 -maxdepth 1 -print0 |
  while IFS= read -r -d '' fname; do
    [[ $fname =~ PATTERN* ]] || continue
    modsecs="$(stat -c %Y $fname)"
    echo "${modsecs},${fname}"
  done |
  sort -n |
  head -2500 |
  cut -f2- -d,

Placing that in a function and invoking from the command line, and using quotes properly to handle files with spaces, shows the final solution

# get newest 'nfiles' files matching 'pattern'
# in the current directory
get_newest_files_by_patt() {
  local nfiles pattern fname modsecs
  nfiles="$1"
  pattern="$2"
  find . -mindepth 0 -maxdepth 1 -print0 |
    while IFS= read -r -d '' fname; do
      # note: no quotes around ${pattern}
      [[ $fname =~ ${pattern} ]] || continue
      modsecs="$(stat -c %Y "${fname}")"
      echo "${modsecs},${fname}"
    done |
    sort -n | head -"${topn}" | cut -f2- -d,
}

# capture output in an array - handles spaces in filenames
cd "${FROM_DIRECTORY}"
mapfile topn_files < <(get_topN_files_by_patt "${NUM_OF_FILES}" "MSCERC*.Z")

# move the files
/bin/mv "${topn_files[@]}" "${DESTINATION_DIRECTORY}"

Solution Notes:

  • Uses find . -mindepth 0 -maxdepth 1 to get the unfiltered directory contents

  • Filters by PATTERN* using bash's =~ regular expression matching operator

  • Injects a numerically sortable modsecs field with a comma on each line

    • with stat -c %Y (modification time in seconds since the Epoch)
  • Sorts the output of the while loop numerically by prepended modsecs

  • Selects the top 2500 with head -2500 from the sorted output

  • Finally removes modsecs up to the comma with cut

Example:

Here's an illustration of the issue 1) create 100k x* files and 100k y* files, /bin/ls * hits the ARG_MAX limit, but /bin/ls x* does not (on my machine)

$ getconf ARG_MAX
2097152
$ mkdir /tmp/d; cd /tmp/d; for i in `seq 100000`; do touch x$i y$i; done
. . . 
$ /bin/ls -tr * | wc -l
-bash: /bin/ls: Argument list too long
$ /bin/ls -tr x* | wc -l
100000

More about the ARG_MAX Limit:

As @Gowtham pointed out, "if you have too many files that match" then the /bin/ls -tr of globbed files MSCERC*.Z is too big, it overflows the ARG_MAX limit on your machine. Technically it is not the number of files but the number of characters (the bigger the files the fewer it will take to hit the limit).

Upvotes: 1

Hasturkun
Hasturkun

Reputation: 36422

Change

ls -tr  $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
    xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"  

do something like the following:

find "$FROM_DIRECTORY" -maxdepth 1 -type f -name 'MSCERC*.Z' -printf '%p\t%T@\n' | sort -k2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs mv -t "$DESTINATION_DIRECTORY"

This uses find to create a list of files with modification timestamps, sorts by the timestamp, then removes the unneeded field before passing the output to head and xargs

EDIT

Another variant, should work with non GNU utils

find "$FROM_DIRECTORY" -type f -name 'MSCERC*.Z' -printf '%p\t%T@' |sort -k 2,2 -r | cut -f1 | head -$NUM_OF_FILES | xargs -i mv \{\} "$DESTINATION_DIRECTORY"

Upvotes: 2

Cybercartel
Cybercartel

Reputation: 12592

First of create a backup list of the files to be treated. Then read the backup file line-by-line and heal it. For example

 #!/bin/bash
 NUM_OF_FILES=2500
 FROM_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in
 DESTINATION_DIRECTORY=/apps/data01/RAID/RC/MD/IN_MSC/ERC/in_load

 if [ ! -d $DESTINATION_DIRECTORY ]  
    then  
            echo "unused_file directory does not exist!"  
    mkdir $DESTINATION_DIRECTORY   
    echo "$DESTINATION_DIRECTORY directory created!"  
  else   
    echo "$DESTINATION_DIRECTORY exist!"    
 fi  

 echo "Moving $NUM_OF_FILES oldest files to $DESTINATION_DIRECTORY directory" 

 ls -tr  $FROM_DIRECTORY/MSCERC*.Z|head -2500 > list
 exec 3<list

 while read file <&3
 do
    mv $file $DESTINATION_DIRECTORY
 done

Upvotes: 1

user25148
user25148

Reputation:

You didn't say, but I assume this is where the problem occurs:

ls -tr  $FROM_DIRECTORY/MSCERC*.Z|head -2500 | \
    xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"  

(You can verify it by adding "set -x" to the top of your script.)

The problem is that the kernel has a fixed maximum size of the total length of the command line given to a new process, and your exceeding that in the ls command. You can work around it by not using globbing and instead using grep:

ls -tr  $FROM_DIRECTORY/ | grep '/MSCERC\*\.Z$' |head -2500 | \
    xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"  

(grep uses regular expressions instead of globs, so the pattern looks a little bit different.)

Upvotes: 3

Gowtham
Gowtham

Reputation: 1485

A quick way to fix this would be to change to $FROM_DIRECTORY, so that you can refer the files using (shorter) relative paths.

cd $FROM_DIRECTORY && ls -tr MSCERC*.Z|head -2500 |xargs -i sh -c "mv {} $DESTINATION_DIRECTORY"

This is also not entirely fool-proof, if you have too many files that match.

Upvotes: 1

Related Questions