Tim
Tim

Reputation: 321

calling shell function using parallel with list of quoted filenames as input

Using Bash.

I have an exported shell function which I want to apply to many files.

Normally I would use xargs, but the syntax like this (see here) is too ugly for use.

...... | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}

In that discussion, parallel had an easier syntax:

..... | parallel -P 10 echo_var {}

Now I have run into the following problem: the list of files to which I want to apply my function is a list of files on one line, each quoted and separated by spaces thus: "file 1" "file 2" "file 3".

how can I feed this space-separated, quoted, list into parallel?

I can replicate the list using echo for testing.

e.g.

echo '"file 1" "file 2" "file 3"'|parallel -d " " my_function {}

but I can't get this to work.

How can I fix it?

Upvotes: 1

Views: 618

Answers (2)

Ole Tange
Ole Tange

Reputation: 33685

The problem boils down to the values can contain space, and space is the value separator. So we need something that can parse the input into separate values containing space. Since they are bash-quoted the obvious choice is to use bash for unquoting the values.

You have several options:

(echo "file 1";
 echo "file  2";
 echo "file \"name\" \$(3)") | parallel my_function

printf "%s\n" "file 1" "file  2" "file \"name\" \$(3)" |
  parallel my_function

If the input is in a variable:

var='"file 1" "file  2" "file \"name\" \$(3)"'
eval 'printf "%s\n" '"$var" |
  parallel my_function

Or you can convert the variable to an array:

var='"file 1" "file  2" "file \"name\" \$(3)"'
eval arr=("$var")

And if the input is in an array:

parallel my_function ::: "${arr[@]}"

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 140960

How can I fix it?

You have to choose a unique separator.

echo 'file 1|file 2|file 3' | xargs -d "|" -n1 bash -c 'my_function "$@"' --
echo 'file 1^file 2^file 3' | parallel -d "^" my_function

The safest is to use zero byte as the separator:

echo -e 'file 1\x00file 2\x00file 3' | xargs -0 ' -n1 bash -c 'my_function "$@"' --
printf "%s\0" 'file 1' 'file 2' 'file 3' | parallel -0 my_function

The best is to store your elements inside a bash array and use a zero separated stream to process them:

files=("file 1" "file 2" "file 3")
printf "%s\0" "${files[@]}" | xargs -0 -n1 bash -c 'my_function "$@"' --
printf "%s\0" "${files[@]}" | parallel -0 my_function

Note that empty arrays will run the function without any arguments. It's sometimes preferred to use -r --no-run-if-empty option not to run the function when input is empty. The --no-run-if-empty is supported by parallel and is a gnu extension in xargs (xargs on BSD and on OSX do not have --no-run-if-empty).

Note: xargs by default parses ', " and \. This is why the following is possible and will work:

echo '"file 1" "file 2" "file 3"' | xargs -n1 bash -c 'my_function "$@"' --
echo "'file 1' 'file 2' 'file 3'" | xargs -n1 bash -c 'my_function "$@"' --
echo 'file\ 1 file\ 2 file\ 3' | xargs -n1 bash -c 'my_function "$@"' --

And it can result in some strange things, so remember to almost always specify -d option to xargs:

$ # note \x replaced by single x
$ echo '\\a\b\c' | xargs
\abc
$ # quotes are parsed and need to match
$ echo 'abc"def' | xargs
xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
$ echo "abc'def" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option

xargs is a portable tool available quite everywhere, while parallel is a GNU program, which has to be installed separately.

Upvotes: 1

Related Questions