Drake Guan
Drake Guan

Reputation: 15122

Parallel executing of commands with pipe by GNU Parallel?

Given a task with several commands combined by pipe:

cat input/file1.json | jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > output/file1.json

Now, there are thousands of input JSON files, and I like to leverage GNU Parallel to parallelize all process. How could I do that? Something like this?

parallel cat {} | jq '...' > output/{./} ::: input/*.json

note: It gets even more complicated if there is a pipe inside jq's filter...

Upvotes: 6

Views: 3972

Answers (1)

Ole Tange
Ole Tange

Reputation: 33685

https://www.gnu.org/software/parallel/man.html#QUOTING says:

Conclusion: To avoid dealing with the quoting problems it may be easier just to write a small script or a function (remember to export -f the function) and have GNU parallel call that.

In your case it will look like this:

doit() {
  cat "$1" |
    jq '.responses[0] | {labelAnnotations: .labelAnnotations}' > "$2" 
}
export -f doit

parallel doit {} output/{/} ::: input/*.json

A nice thing about this is that you can test it:

doit input/foo1.json output/foo1.json

And when that works, parallelizing it is trivial.

If you have newer version of GNU Parallel this should work, too:

parallel --results output/{/} -q jq '.responses[0] | {labelAnnotations: .labelAnnotations}' ::: input/*.json

Upvotes: 5

Related Questions