user1189851
user1189851

Reputation: 5041

passing command line arguments from xargs

I have a shell script that does the following:

INPUT_DIR is a directory containing zip files. The zip files look like lograw1.zip, lograw2.zip etc.

I have a program called prog.py that takes input as the zip file using an option -i and outputs a tsv file. The tsv file corresponding to each zip file has a different name.

i.e, lograw1.zip gives an ouput of logprocessed1.tsv

I have a shell script with this for loop and it works perfectly.

for f in $INPUT_DIR/*.zip
  do
    filename=$(basename $f .zip)
    tsvfilename="$(basename "${f/raw/processed}" .zip).tsv"
    python /work/prog.py -i $f  $OUTPUT_DIR/$tsvfilename
  done

I want to make use of xargs in a bash script to read through the zips and submit to my prog doing similar renaming of files. How can I use xargs to send as an argument between a command? Thanks in advance.

Upvotes: 1

Views: 3006

Answers (1)

David W.
David W.

Reputation: 107040

xargs works by taking a list of file names from standard input and running a command with all of the files it retrieves from standard input. If the length of standard input will overflow the command line buffer, xargs will divide up the list.

Let's do something simple:

ls | xargs /bin/echo

Let's say your directory looks like this:

  • foo
  • bar
  • barfoo
  • foobar
  • barbar
  • foofoo

The ls command will output:

foo bar barfoo foobar barbar foofoo

This will be passed to /bin/echo and the following will be executed:

/bin/echo foo bar barfoo foobar barbar foofoo

Now, let's say your input buffer is only 12 characters long, passing all of those files to /bin/echo will overrun your input buffer. xargs will take care of this for you by making sure it doesn't pass more than 10 characters of data to the /bin/echo command and will reexecute the /bin/echo command over and over until all files are passed to it:

/bin/echo foo bar
/bin/echo barfoo foobar
/bin/echo barber foofoo

You can test this by passing the -s10 parameter to xargs which will limit the number of characters that xargs will take. You can also try the -t which will echo exactly what xargs is executing.


The reason for this tutorial is to understand that in order for xargs to work, each and every command your shell script must be able to use multiple files and this isn't the case. The /work/prog.py looks like it takes one file at a time, and so does the basename command.

You'll have to modify your script to take advantage of xargs in order for that to work. Probably by using a for loop to handle this.

Think about using find with your shell script which may do what you want:

find . -name "*.zip" -exec script.sh {} \;

There are issues with xargs (it doesn't handle file names with funky characters by default) and there is a question whether it is any faster. After all:

rm *

Still must remove the files passed to it one at a time, just like this little script does:

for file in *
do
    rm $file
done

In the old Unix days, starting a new process took a lot of overhead, so if you could run a command once instead of multiple, it could save you time. I don't know if this is now worth the trouble.

Upvotes: 4

Related Questions