719016
719016

Reputation: 10441

xargs print output from wc -l

I would like to use xargs to count the number of blocks of 4 lines in a list of compressed files, and do the counting in parallel using 8 CPUs, like this:

find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I % gunzip -c % | paste - - - - | wc -l    

Currently, this one-liner does the calculation but I cannot see the output count except for the last one.

What do I need to add to be able to see the number coming from wc -l associated to the input file? Any ideas?

Upvotes: 0

Views: 405

Answers (1)

twalberg
twalberg

Reputation: 62439

If I understand your question right, you have a wrong assumption. It would appear that you expect that

gunzip -c <filename> | paste - - - - | wc -l

will be run for each file that find reports. This is incorrect. What is actually happening is that

gunzip -c <filename>

is being run for each file, the outputs of each uncompressed file are all being combined into one large body, and paste - - - - | wc -l is being run on that combined result.

A better approach would be to write a short shell script, say count_groups.sh that looks something like this:

#!/bin/bash
nlines=$(gzcat $1 | wc -l)
(( ngroups = nlines / 4 ))
echo "$1 : $ngroups"

Then, run chmod +x count_groups.sh, and run

find $PWD/ -name "*.ext.gz" | xargs -t -n1 -P8 -I% ./count_groups.sh %

Upvotes: 1

Related Questions