Chargaff
Chargaff

Reputation: 1572

awk average part of column if lines (specific field) match

Here is a sample of my input file :

$cat NDVI-bm  
P01 031.RAW 0.516 0 0  
P01 021.RAW 0.449 0 0  
P02 045.RAW 0.418 0 0  
P03 062.RAW 0.570 0 0  
P03 064.RAW 0.469 0 0  
P04 083.RAW 0.636 0 0  
P04 081.RAW 0.592 0 0  
P04 082.RAW 0.605 0 0  
P04 084.RAW 0.648 0 0  
P05 093.RAW 0.748 0 0

I need to average column 3 if first field match. Simple enough, but I'm struggling as my awk knowledges are quite basics... Here is what I have so far :

awk '{array[$1]+=$3(need to divide here by number of matches...)} END { for (i in array) {print i"," array[i]}}' NDVI-bm

By searching the web, I'm really not sure I'm heading in the right way... unless there is an easy way to count the number of matches, wich I can't seem to find... Any ideas ?

Thanks for any help !

Upvotes: 0

Views: 3009

Answers (3)

DigitalRoss
DigitalRoss

Reputation: 146043

{ total[$1] += $3; ++n[$1] }

END { for(i in total) print i, total[i] / n[i] }

Upvotes: 1

Fredrik Pihl
Fredrik Pihl

Reputation: 45634

E.g to calculate the average of lines starting with "P01":

/^P01/{
    num+=1
    cnt+=$3
}
END {print "avg = " cnt/num}

Output:

$ awk -f avg.awk input
avg = 0.4825

...or, as a oneliner:

$ awk '/^P01/{cnt+=$3; num+=1} END{print "avg="cnt/num}' input

Or to do the calculations for all values of the first column simultaneously:

{
    sum[$1]+=$3
    cnt[$1]++
}


END {
    print "Name" "\t" "sum" "\t" "cnt" "\t" "avg"
    for (i in sum)
        print i "\t" sum[i] "\t" cnt[i] "\t" sum[i]/cnt[i]

}

Outputs:

$ awk -f avg.awk input
Name    sum     cnt     avg
P01     0.965   2       0.4825
P02     0.418   1       0.418
P03     1.039   2       0.5195
P04     2.481   4       0.62025
P05     0.748   1       0.748

Upvotes: 4

hmakholm left over Monica
hmakholm left over Monica

Reputation: 23332

Have a different array where you keep track of the number of entries you have seen for each index, and do the division in the END block.

Upvotes: 0

Related Questions