Tiago Bruno
Tiago Bruno

Reputation: 413

How can I count rows base in several ranges in awk?

I have a file with 2 columns like:

mm6 8
mm6 1
mm6 15
mm6 30
mm9 2
mm6 20
mm6 12

I am trying to make a script in awk to extract how many rows there are within a range. for now I have :

awk    { if ($2 >= 1 && $2 <= 20) print $1 " " $2}

in the output I get all rows that fall into that range

mm6 8
mm6 1
mm6 15
mm9 2
mm6 20
mm6 12

but now I am trying to make awk to count how many rows I have for each piece of 10 units until it reaches a determined value, 100 for example.

I expected a output like this:

mm6 10 2
mm6 20 3
mm9 10 1

Explanation: mm6 has between 1-10 2 values, mm6 has between 11-20 3 values, mm9 has between 1-10 1 value

I am stuck, can someone help?

Upvotes: 2

Views: 418

Answers (1)

karakfa
karakfa

Reputation: 67547

awk to the rescue!

using your first input

$ awk '{a[$1 FS 10*int(($2-1)/10)+10]++}
    END{for(k in a) print k,a[k]}' file                       

mm6 10 2
mm6 20 3
mm6 30 1
mm9 10 1

you can add filters before or after.

Explanation: We create a key to count and print the key with counts at the end. The key is two parts, first the identifier the second is the mapping of the ranges to bins. For example to map 0-9 to 0, 10-19 to 1 you can divide by 10 and use the integer part. Your ranges are from 1-10 so subtract one before dividing by 10; your bins are multiples of 10, so multiply by 10. also you use upper bound so add 10.

Upvotes: 3

Related Questions