Andrey Fedorov
Andrey Fedorov

Reputation: 9669

Command composition in bash

So I have the equivalent of a list of files being output by another command, and it looks something like this:

http://somewhere.com/foo1.xml.gz
http://somewhere.com/foo2.xml.gz
...

I need to run the XML in each file through xmlstarlet, so I'm doing ... | xargs gzip -d | xmlstarlet ..., except I want xmlstarlet to be called once for each line going into gzip, not on all of the xml documents appended to each other. Is it possible to compose 'gzip -d' 'xmlstarlet ...', so that xargs will supply one argument to each of their composite functions?

Upvotes: 2

Views: 1301

Answers (4)

Ole Tange
Ole Tange

Reputation: 33685

Use GNU Parallel:

cat filelist | parallel 'zcat {} | xmlstarlet >{.}.out'

or if you want to include the fetching of urls:

cat urls | parallel 'wget -O - {} | zcat | xmlstarlet >{.}.out'

It is easy to read and you get the added benefit of having on job per CPU run in parallel. Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246774

If xmlstarlet can operate on stdin instead of having to pass it a filename, then:

some command | xargs -i -n1 sh -c 'zcat "{}" | xmlstarlet options ...'

The xargs option -i means you can use the "{}" placeholder to indicate where the filename should go. Use -n 1 to indicate xargs should only one line at a time from its input.

Upvotes: 0

hmontoliu
hmontoliu

Reputation: 4019

Although the right answer is the one suggested by shelter (+1), here is a one-liner "divertimento" providing that the input is the proposed by Andrey (a command that generates the list of urls) :-)

~$ eval $(command | awk '{a=a "wget -O - "$0" | gzip -d | xmlstartlet > $(basename "$0" .gz ).new; " } END {print a}')

It just generates a multi command line that does wget http://foo.xml.gz | gzip -d | xmlstartlet > $(basenname foo.xml.gz .gz).new for each of the urls in the input; after the resulting command is evaluated

Upvotes: 1

shellter
shellter

Reputation: 37258

Why not read your file and process each line separately in the shell? i.e.

fileList=/path/to/my/xmlFileList.txt
cat ${fileList} \
| while read fName ; do
   gzip -d ${fName} | xmlstartlet > ${fName}.new
done 

I hope this helps.

Upvotes: 4

Related Questions