Reputation: 41
This is what I am trying to do:
find the mean values for x,y,z for the HETATM records , the x value are the 7 field, the y values are the 8 field, and z values are the 9 field.
I am trying to do this using this file http://pastebin.com/EqA2SUMy
Here is the sample
HETATM 1756 O HOH A 501 -0.923 10.560 127.393 1.00 16.58 O
HETATM 1757 O HOH A 502 9.272 22.148 134.167 1.00 15.08 O
HETATM 1758 O HOH A 503 0.109 20.243 112.094 1.00 20.74 O
HETATM 1759 O HOH A 504 -3.930 10.522 125.779 1.00 20.79 O
HETATM 1760 O HOH A 505 -0.759 36.323 88.018 1.00 17.42 O
HETATM 1761 O HOH A 506 -4.645 51.936 81.852 1.00 21.43 O
HETATM 1762 O HOH A 507 -3.900 17.103 128.596 1.00 14.08 O
HETATM 1763 O HOH A 508 6.834 21.053 135.062 1.00 16.98 O
Can anyone show me how to do a script for this.
(this part is related to a comment viewers can ignore)
ATOM 214 OE2 GLU A 460 -2.959 24.000 103.360 1.00 32.19 O
ATOM 215 N ARG A 461 -5.878 28.748 106.473 1.00 22.68 N
ATOM 216 CA ARG A 461 -6.553 30.043 106.524 1.00 24.34 C
ATOM 217 C ARG A 461 -5.583 31.176 106.219 1.00 22.42 C
ATOM 218 O ARG A 461 -5.918 32.121 105.497 1.00 25.07 O
ATOM 219 CB ARG A 461 -7.222 30.272 107.887 1.00 24.53 C
ATOM 220 CG ARG A 461 -8.425 29.394 108.150 1.00 26.38
Upvotes: 0
Views: 107
Reputation: 753525
It's not rocket science. (Updated to catch only HETATM records — a trivial change; you can use more exacting regexes if you need to. However, it is also necessary to count which records match and use that count, not NR
, since you're ignoring many records, in general.)
awk '/HETATM/ { sum7 += $7; sum8 += $8; sum9 += $9; count++ }
END { if (count > 0)
printf("avg(x) = %f, avg(y) = %f, avg(z) = %f\n",
sum7/count, sum8/count, sum9/count)
}'
And yes, you could put it all on one line but it wouldn't be as readable.
I can't answer for why it produced zeros for you; when run on the data from the question, wobbly line starts and all, it produced the output:
avg(x) = 0.257250, avg(y) = 23.736000, avg(z) = 116.620125
If you think there is a possibility of empty input (or, at least, no HETATM records in the input) and an error message is not acceptable, then you can protect the printing action with if (count > 0)
or equivalent (added to script). You can generate your own preferred output if ``count` is zero.
Upvotes: 0
Reputation: 203209
$ awk '{for (i=1;i<=3;i++) sum[i]+=$(i+6)}
END{if (NR) for (i=1;i in sum;i++) print sum[i]/NR}' file
0.25725
23.736
116.62
The if (NR)
is necessary to avoid a divide by zero error on an empty file.
If @jaypal is correct and you need to select just the input lines containing HETATM then change it to:
awk '/HETATM/{++nr; for (i=1;i<=3;i++) sum[i]+=$(i+6)}
END{if (nr) for (i=1;i in sum;i++) print sum[i]/nr}' file
Upvotes: 1