Why is GNU parallel as slow as single-CPU xargs for this command?

Question

I have a bash command that takes a directory full of XML files, runs them through XSLT to CSV's, and combines all of the transforms into a single file. I've been attempting to use parallel, but the CPU usage never goes above 100% for this command. I cannot use xargs for this because the output gets interspersed.

This takes ~30 seconds, but again, the output is interspersed: find /path/to/xml -type f -iname '*.xml' -print0 | xargs -0 -P8 xsltproc transform.xsl > out.txt

This takes ~90 seconds. Single Core. find /path/to/xml -type f -iname '*.xml' -print0 | xargs -0 xsltproc transform.xsl > out.txt

This also takes ~90 seconds. As slow as single-core, and CPU useage from top never goes above 100%. find /path/to/xml -type f -iname '*.xml' -print0 | parallel -0 xsltproc transform.xsl > out.txt

This seems so dead simple, I don't know what I'm missing. Could anyone offer a suggestion?

Why is GNU parallel as slow as single-CPU xargs for this command?

Answers (1)

Related Questions