Print certain parameter in column with awk

Question

I stumbled over a little problem, which I am not able to solve with awk in a bash script.

I do have following data file:

 33   1000   1.108932e-01   2.825803e+00  -9.955642e-05    0.0000e+00       0.0000e+00    8.012180e-02 4.081916e-02

 0.0000e+00   7.8557e-01   6.1128e+01   4.0468e+00  -9.9558e-05   3.8526e-02   3.1874e-03   5.1303e-01   0.0000e+00

 1.6667e-02   7.8530e-01   6.0977e+01   4.0552e+00   1.0627e-01   7.8951e-02   6.2521e-03   5.0750e-01   0.0000e+00

...

which has a header line with 10 elements, followed by an array with 33 rows and 9 columns.

I would like to use the data in this file to print out the forth parameter from the header line followed by the average of line 3 (i.e. sum+=$3 / {Number of lines}). At the moment, I try to do it like:

gawk '{time=FNR==1{$4};if(NR>1)sum+=$3}; time = FNR == 1{$4} END {sum=sum/(NR-1); print time " " sum}' $tmpn.data >> $tmpn.vrms

It works fine for the average, however, the time paramter is not correct and I only get a 0 as return. Maybe I am missing only a small thing, but, unfortunately I couldn't find anything online. What would be the best way to solve this issue.

Thanks for the help.

Cheers.

mklement0 · Accepted Answer

Try:

awk 'NR==1 {time=$4;next} {sum+=$3} END {print time, (sum/(NR-1))}' $tmpn.data >>$tmpn.vrms

NR==1 {time=$4;next} is a pattern-action pair:
- Pattern (condition) NR==1 is only true for the first input line.
- Thus, action {time=$4;next} is only executed for the first line, and it stores the header's 4th field in variable time, then proceeds to the next record (line; next).
{sum+=$3}, which is processed for all remaining records (i.e, the data records), iteratively sums up the values in the 3rd field in variable sum.
END {print time, (sum/(NR-1))}:
- The END block is executed after all input records have been processed.
- {print time, (sum/(NR-1))} prints the header field and the average of the 3rd-field values, separated by the default output field separator (OFS), which is a space. Note that NR contains the total number of input records inside the END block.

A note on your solution attempt and awk's philosophy:

As (currently) stated, your command breaks, because you've enclosed the entire script in {...}.
Generally, awk's terse elegance comes from a sequence of carefully crafted pattern-action pairs.
- A pattern is a condition (Boolean expression) that executes the associated action (a sequence of statements) only, if the condition is true.
- Think of the pattern as the conditional part of an if statement with the "syntactic noise" removed, and the action as the body of that if statement:
  { ; ... } is (conceptually) short for if () { ; ... }
In a given pair, you may either omit the action, or the pattern:
- If you omit the pattern, the action is executed unconditionally (though the action may still not get to execute, if a previous pattern-action pair skipped further processing, such as with next or exit).
- If you omit the action, the default action is { print }, i.e., to print the (potentially modified) current record.
  - This behavior is enables the common shorthand 1 to simply print the current record: 1 is a pattern that, in the Boolean context in which patterns are evaluated, is always true, and, in the absence of an associated action, the current record is printed by default.

Print certain parameter in column with awk

Answers (2)

Related Questions