Finding the mean values of a field using awk?

Question

This is what I am trying to do:

find the mean values for x,y,z for the HETATM records , the x value are the 7 field, the y values are the 8 field, and z values are the 9 field.

I am trying to do this using this file http://pastebin.com/EqA2SUMy

Here is the sample

  HETATM 1756  O   HOH A 501      -0.923  10.560 127.393  1.00 16.58           O  
 HETATM 1757  O   HOH A 502       9.272  22.148 134.167  1.00 15.08           O  
 HETATM 1758  O   HOH A 503       0.109  20.243 112.094  1.00 20.74           O  
 HETATM 1759  O   HOH A 504      -3.930  10.522 125.779  1.00 20.79           O  
 HETATM 1760  O   HOH A 505      -0.759  36.323  88.018  1.00 17.42           O  
 HETATM 1761  O   HOH A 506      -4.645  51.936  81.852  1.00 21.43           O  
 HETATM 1762  O   HOH A 507      -3.900  17.103 128.596  1.00 14.08           O  
 HETATM 1763  O   HOH A 508       6.834  21.053 135.062  1.00 16.98           O

Can anyone show me how to do a script for this.

(this part is related to a comment viewers can ignore)

        ATOM    214  OE2 GLU A 460      -2.959  24.000 103.360  1.00 32.19           O  
     ATOM    215  N   ARG A 461      -5.878  28.748 106.473  1.00 22.68           N  
     ATOM    216  CA  ARG A 461      -6.553  30.043 106.524  1.00 24.34           C  
    ATOM    217  C   ARG A 461      -5.583  31.176 106.219  1.00 22.42           C  
    ATOM    218  O   ARG A 461      -5.918  32.121 105.497  1.00 25.07           O  
   ATOM    219  CB  ARG A 461      -7.222  30.272 107.887  1.00 24.53           C  
   ATOM    220  CG  ARG A 461      -8.425  29.394 108.150  1.00 26.38

Ed Morton · Accepted Answer

$ awk '{for (i=1;i<=3;i++) sum[i]+=$(i+6)}
    END{if (NR) for (i=1;i in sum;i++) print sum[i]/NR}' file
0.25725
23.736
116.62

The if (NR) is necessary to avoid a divide by zero error on an empty file.

If @jaypal is correct and you need to select just the input lines containing HETATM then change it to:

awk '/HETATM/{++nr; for (i=1;i<=3;i++) sum[i]+=$(i+6)}
    END{if (nr) for (i=1;i in sum;i++) print sum[i]/nr}' file

Finding the mean values of a field using awk?

Answers (2)

Related Questions