Reputation: 10441
I would like to use xargs
to count the number of blocks of 4 lines in a list of compressed files, and do the counting in parallel using 8 CPUs, like this:
find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I % gunzip -c % | paste - - - - | wc -l
Currently, this one-liner does the calculation but I cannot see the output count except for the last one.
What do I need to add to be able to see the number coming from wc -l
associated to the input file?
Any ideas?
Upvotes: 0
Views: 405
Reputation: 62439
If I understand your question right, you have a wrong assumption. It would appear that you expect that
gunzip -c <filename> | paste - - - - | wc -l
will be run for each file that find
reports. This is incorrect. What is actually happening is that
gunzip -c <filename>
is being run for each file, the outputs of each uncompressed file are all being combined into one large body, and paste - - - - | wc -l
is being run on that combined result.
A better approach would be to write a short shell script, say count_groups.sh
that looks something like this:
#!/bin/bash
nlines=$(gzcat $1 | wc -l)
(( ngroups = nlines / 4 ))
echo "$1 : $ngroups"
Then, run chmod +x count_groups.sh
, and run
find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I% ./count_groups.sh %
Upvotes: 1