Reputation: 23
I stumbled over a little problem, which I am not able to solve with awk in a bash script.
I do have following data file:
33 1000 1.108932e-01 2.825803e+00 -9.955642e-05 0.0000e+00 0.0000e+00 8.012180e-02 4.081916e-02
0.0000e+00 7.8557e-01 6.1128e+01 4.0468e+00 -9.9558e-05 3.8526e-02 3.1874e-03 5.1303e-01 0.0000e+00
1.6667e-02 7.8530e-01 6.0977e+01 4.0552e+00 1.0627e-01 7.8951e-02 6.2521e-03 5.0750e-01 0.0000e+00
...
which has a header line with 10 elements, followed by an array with 33 rows and 9 columns.
I would like to use the data in this file to print out the forth parameter from the header line followed by the average of line 3 (i.e. sum+=$3 / {Number of lines}
). At the moment, I try to do it like:
gawk '{time=FNR==1{$4};if(NR>1)sum+=$3}; time = FNR == 1{$4} END {sum=sum/(NR-1); print time " " sum}' $tmpn.data >> $tmpn.vrms
It works fine for the average, however, the time paramter is not correct and I only get a 0 as return. Maybe I am missing only a small thing, but, unfortunately I couldn't find anything online. What would be the best way to solve this issue.
Thanks for the help.
Cheers.
Upvotes: 2
Views: 109
Reputation: 37464
Another version in awk using getline
in while
loop to read and detect the end of file and then output the header buffer b
and the average:
$ awk 'NR==1{b=$4; while(getline==1){s+=$3;c++} print b,s/c}' data
4th 40.7386
It expects the data
file to have a header line. Explained:
NR==1 { # read in the first line and ...
b=$4 # ... buffer the 4th field of the header
while(getline==1) { # then read while there are records to read
s+=$3 # sum up the values in the 3rd field
c++ # count the number of values, add if($3!="") if needed
}
print b, s/c # after while output header and average
}
Upvotes: 0
Reputation: 440556
Try:
awk 'NR==1 {time=$4;next} {sum+=$3} END {print time, (sum/(NR-1))}' $tmpn.data >>$tmpn.vrms
NR==1 {time=$4;next}
is a pattern-action pair:
NR==1
is only true for the first input line.{time=$4;next}
is only executed for the first line, and it stores the header's 4th field in variable time
, then proceeds to the next record (line; next
).{sum+=$3}
, which is processed for all remaining records (i.e, the data records), iteratively sums up the values in the 3rd field in variable sum
.
END {print time, (sum/(NR-1))}
:
END
block is executed after all input records have been processed.{print time, (sum/(NR-1))}
prints the header field and the average of the 3rd-field values, separated by the default output field separator (OFS
), which is a space. Note that NR
contains the total number of input records inside the END
block.A note on your solution attempt and awk
's philosophy:
As (currently) stated, your command breaks, because you've enclosed the entire script in {...}
.
Generally, awk
's terse elegance comes from a sequence of carefully crafted pattern-action pairs.
if
statement with the "syntactic noise" removed, and the action as the body of that if
statement:<pattern> { <action-cmd1>; ... }
is (conceptually) short for if (<pattern>) { <action-cmd1>; ... }
In a given pair, you may either omit the action, or the pattern:
If you omit the pattern, the action is executed unconditionally (though the action may still not get to execute, if a previous pattern-action pair skipped further processing, such as with next
or exit
).
If you omit the action, the default action is { print }
, i.e., to print the (potentially modified) current record.
1
to simply print the current record: 1
is a pattern that, in the Boolean context in which patterns are evaluated, is always true, and, in the absence of an associated action, the current record is printed by default.Upvotes: 3