Reputation: 95
I have hundreds of folders with a file (file.tsv) containing four columns and I want to count number of lines matching given threshold on a column.
For example: file.tsv (all files contains same dataformat)
S.No age weight height
1 25 65 175
2 28 75 180
3 24 72 179
4 26 80 190
I can get the count using command:
for i in $(find . -name '*.tsv'); do awk '$3>=70' $i | wc -l ; done
It gives output like
3
but I need output like:
file.tsv 3
Upvotes: 0
Views: 326
Reputation: 203334
Al you need is:
find . -name '*.tsv' |
xargs -n 1 awk '(NR>1) && ($3>=70){ctr++} END{print FILENAME, ctr+0}'
If your file names can contain newlines then add -print0
to the find
and -0
to the xargs
.
Upvotes: 1
Reputation: 2872
You were almost there...
for i in $(find . -name '*.tsv'); do
awk 'BEGIN {ctr=0}
$3+0 >= 70 { ctr++ }
{ next }
END { print FILENAME " " ctr }' $i;
done
By testing ctr
in the END block you can also suppress the files not having a $3 >= 70
, or having too few of them
(the edit corrected the flaw as pointed out in the OP's comment)
Upvotes: 1
Reputation: 5940
Maybe something like this?
for i in $(find . -name '*.tsv'); do echo -n "$i: " ; awk '$3>=70' $i | wc -l ; done
This is basically YOUR solution with just the filename printed adding a simple echo -n "$i:
before awk.
I would implement it in a slightly different way:
find . -name '*.tsv' -print0 | xargs -0 -I {} awk '{if($3 >= 70) { count++ ;}} END{print FILENAME,count}' {}
Upvotes: 1