PKKid
PKKid

Reputation: 3086

Best output format for Xargs

I'm writing a simple program to run through a bunch of files in various directories on my system. It basically involves opening them up and checking for valid XML. One of the options of this program is to list bad xml files.

This leads me to my question. What the best output to format this for use with XARGS. I thought putting each entry on a newline would be good enough, but it seems a bit confusing. because the filenames all have spaces.

So say my output is:

./dir name 1/file 1.xml
./dir name 2/file 2.xml
./dir name 3/file 3.xml

I tried the following command, but it keeps saying "No such file or directory".

./myprogram.py --list BADXML | xargs -d '\n' cat

So.. I am either misunderstanding how to use XARGS or I need to slightly change the format of the output of my program. I am not sure the best easiest to use) route to take here. i would hate to have to always type a mess of xarg options if I can avoid it.

Upvotes: 3

Views: 1584

Answers (3)

Charles Stewart
Charles Stewart

Reputation: 11837

You could ditch xargs, and use read:

./myprogram.py --list BADXML | while read -a line; do cat "${line[*]}"; done

Anything xargs can do, while-read loops can do better...

Postscript Per my When should xargs be preferred over while-read-loops question, the answers stressed a very strong efficiency case for xargs, although it is not too difficult to simulate the argument bunching of xargs with some extra scripting, e.g.

batch10cat () {
    local i=1 argv line
    declare -a argv
    while read -r line; do
        argv[i]="$line"
        let i++
        if test $i -gt 10; then i=1; cat "${argv[@]}"; fi
    done
    if test $i -gt 1; then cat "${argv[@]}"; fi
}
./myprogram.py --list BADXML | batch10 cat

Upvotes: 1

Ole Tange
Ole Tange

Reputation: 33725

With GNU Parallel http://www.gnu.org/software/parallel/ you should be able to do it with no change to myprogram.py:

./myprogram.py --list BADXML | parallel cat

Added bonus: the cat will run in parallel and may thus be faster on multicore computers.

Upvotes: 0

Dyno Fu
Dyno Fu

Reputation: 9044

man xargs

--null

-0 Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or backslashes. The GNU find -print0 option produces input suitable for this mode.

Upvotes: 2

Related Questions