Reputation: 193
I am using bash + awk to extract some information from log files located within the directory and save the summary in the separate file. In the bottom of each log file, there is a table like:
mode | affinity | dist from best mode
| (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1 -6.961 0 0
2 -6.797 2.908 4.673
3 -6.639 27.93 30.19
4 -6.204 2.949 6.422
5 -6.111 24.92 28.55
6 -6.058 2.836 7.608
7 -5.986 6.448 10.53
8 -5.95 19.32 23.99
9 -5.927 27.63 30.04
10 -5.916 27.17 31.29
11 -5.895 25.88 30.23
12 -5.835 26.24 30.36
from this I need to focus on the (negative) values located in the second column. Notably I need to take 10 first values from the second column (from -6.961 to -5.916) and compute the mean for it and save the mean value together with the name of the log as one string in new ranking.log so for 5 processed logs it should be something like:
# ranking_${output}.log
log_name1 -X.XXX
log_name2 -X.XXX
log_name3 -X.XXX
log_name4 -X.XXX
log_name5 -X.XXX
Where -X.XXX is the mean value computed for each log for first 10 positions.
Here is my awk code integrated in bash function, which extract the first value (-6.961 in the example table ) from each log (without mean computation).
# take only the first line (lowest dG) from each log
take_the_first_value () {
awk '$1=="1"{sub(/.*\//,"",FILENAME); sub(/\.log/,"",FILENAME); printf("%s: %s\n", FILENAME, $2)}' "${results}"/*.log > "${results}"/ranking.csv
}
May I modify the AWK part to add the computing of the MEAN values instead of taking always the value located in the first line of the table?
Upvotes: 0
Views: 62
Reputation: 204731
With GNU awk for ENDFILE:
$ cat tst.sh
#!/usr/bin/env bash
awk '
($2+0) < 0 {
sum += $2
if ( ++cnt == 10 ) {
nextfile
}
}
ENDFILE {
print FILENAME, (cnt ? sum/cnt : 0)
cnt = sum = 0
}
' "${@:--}"
$ ./tst.sh file
file -6.2549
Note that the above will work even if your input files have fewer than 10 lines at the end, including empty files.
Upvotes: 2
Reputation: 88999
I suggest with GNU awk
:
awk -v num=10 'BEGINFILE{ c=sum=0 }
$1~/^[0-9]+$/ && NF==4{
c++; sum=sum+$2;
if(c==num){
sub(/.*\//, "", FILENAME);
print FILENAME, sum/num
}
}' "${results}"/*.log >> "${results}"/ranking.csv
I used $1~/^[0-9]+$/ && NF==4
to identify the correct lines.
Upvotes: 1
Reputation: 19271
This gives you the averages. The pattern used to find the first value is the line ^---+---
followed by a [:digit:]
in the first field of the next line. For each log file do
$ awk '$1~/[[:digit:]]/ && set==1{ x+=$2; i++;
gsub(/\/*.*\//,"", FILENAME);
if(i==10){ set=0; print FILENAME, x/i; i=0; x=0 } }
/^\-+\+\-+/{ set=1 }' "${results}"/*.log > "${results}"/ranking.csv
Upvotes: 1