Reputation: 95

Count number of line matching given threshold value and return filename along with count

I have hundreds of folders with a file (file.tsv) containing four columns and I want to count number of lines matching given threshold on a column.

For example: file.tsv (all files contains same dataformat)

S.No age weight height  
1    25  65     175  
2    28  75     180  
3    24  72     179  
4    26  80     190

I can get the count using command:

for i in $(find . -name '*.tsv'); do awk '$3>=70' $i | wc -l ; done

It gives output like

3

but I need output like:

file.tsv 3

Upvotes: 0

Answers (3)

Ed Morton

Reputation: 203334

Al you need is:

find . -name '*.tsv' |
xargs -n 1 awk '(NR>1) && ($3>=70){ctr++} END{print FILENAME, ctr+0}'

If your file names can contain newlines then add -print0 to the find and -0 to the xargs.

Upvotes: 1

Ronald

Reputation: 2872

You were almost there...


    for i in $(find . -name '*.tsv'); do 
    awk 'BEGIN {ctr=0} 
         $3+0 >= 70 { ctr++ } 
         { next } 
         END { print FILENAME " " ctr }' $i; 
    done

By testing ctr in the END block you can also suppress the files not having a $3 >= 70, or having too few of them (the edit corrected the flaw as pointed out in the OP's comment)

Upvotes: 1

mauro

Reputation: 5940

Maybe something like this?

for i in $(find . -name '*.tsv'); do echo -n "$i: " ; awk '$3>=70' $i | wc -l ; done

This is basically YOUR solution with just the filename printed adding a simple echo -n "$i: before awk.

I would implement it in a slightly different way:

find . -name '*.tsv' -print0 | xargs -0 -I {} awk '{if($3 >= 70) { count++ ;}} END{print FILENAME,count}' {}

Upvotes: 1

Count number of line matching given threshold value and return filename along with count

Answers (3)

Related Questions