Reputation: 5041
I have a shell script that does the following:
INPUT_DIR is a directory containing zip files. The zip files look like lograw1.zip, lograw2.zip etc.
I have a program called prog.py that takes input as the zip file using an option -i and outputs a tsv file. The tsv file corresponding to each zip file has a different name.
i.e, lograw1.zip gives an ouput of logprocessed1.tsv
I have a shell script with this for loop and it works perfectly.
for f in $INPUT_DIR/*.zip
do
filename=$(basename $f .zip)
tsvfilename="$(basename "${f/raw/processed}" .zip).tsv"
python /work/prog.py -i $f $OUTPUT_DIR/$tsvfilename
done
I want to make use of xargs in a bash script to read through the zips and submit to my prog doing similar renaming of files. How can I use xargs to send as an argument between a command? Thanks in advance.
Upvotes: 1
Views: 3006
Reputation: 107040
xargs works by taking a list of file names from standard input and running a command with all of the files it retrieves from standard input. If the length of standard input will overflow the command line buffer, xargs
will divide up the list.
Let's do something simple:
ls | xargs /bin/echo
Let's say your directory looks like this:
The ls
command will output:
foo bar barfoo foobar barbar foofoo
This will be passed to /bin/echo
and the following will be executed:
/bin/echo foo bar barfoo foobar barbar foofoo
Now, let's say your input buffer is only 12 characters long, passing all of those files to /bin/echo
will overrun your input buffer. xargs
will take care of this for you by making sure it doesn't pass more than 10 characters of data to the /bin/echo
command and will reexecute the /bin/echo
command over and over until all files are passed to it:
/bin/echo foo bar
/bin/echo barfoo foobar
/bin/echo barber foofoo
You can test this by passing the -s10
parameter to xargs
which will limit the number of characters that xargs
will take. You can also try the -t
which will echo exactly what xargs
is executing.
The reason for this tutorial is to understand that in order for xargs
to work, each and every command your shell script must be able to use multiple files and this isn't the case. The /work/prog.py
looks like it takes one file at a time, and so does the basename
command.
You'll have to modify your script to take advantage of xargs
in order for that to work. Probably by using a for
loop to handle this.
Think about using find
with your shell script which may do what you want:
find . -name "*.zip" -exec script.sh {} \;
There are issues with xargs
(it doesn't handle file names with funky characters by default) and there is a question whether it is any faster. After all:
rm *
Still must remove the files passed to it one at a time, just like this little script does:
for file in *
do
rm $file
done
In the old Unix days, starting a new process took a lot of overhead, so if you could run a command once instead of multiple, it could save you time. I don't know if this is now worth the trouble.
Upvotes: 4