Lukas Fuchs
Lukas Fuchs

Reputation: 23

Print certain parameter in column with awk

I stumbled over a little problem, which I am not able to solve with awk in a bash script.

I do have following data file:

 33   1000   1.108932e-01   2.825803e+00  -9.955642e-05    0.0000e+00       0.0000e+00    8.012180e-02 4.081916e-02

 0.0000e+00   7.8557e-01   6.1128e+01   4.0468e+00  -9.9558e-05   3.8526e-02   3.1874e-03   5.1303e-01   0.0000e+00

 1.6667e-02   7.8530e-01   6.0977e+01   4.0552e+00   1.0627e-01   7.8951e-02   6.2521e-03   5.0750e-01   0.0000e+00

...

which has a header line with 10 elements, followed by an array with 33 rows and 9 columns.

I would like to use the data in this file to print out the forth parameter from the header line followed by the average of line 3 (i.e. sum+=$3 / {Number of lines}). At the moment, I try to do it like:

gawk '{time=FNR==1{$4};if(NR>1)sum+=$3}; time = FNR == 1{$4} END {sum=sum/(NR-1); print time " " sum}' $tmpn.data >> $tmpn.vrms

It works fine for the average, however, the time paramter is not correct and I only get a 0 as return. Maybe I am missing only a small thing, but, unfortunately I couldn't find anything online. What would be the best way to solve this issue.

Thanks for the help.

Cheers.

Upvotes: 2

Views: 109

Answers (2)

James Brown
James Brown

Reputation: 37464

Another version in awk using getline in while loop to read and detect the end of file and then output the header buffer b and the average:

$ awk 'NR==1{b=$4; while(getline==1){s+=$3;c++} print b,s/c}' data
4th 40.7386

It expects the datafile to have a header line. Explained:

NR==1 {                  # read in the first line and ...
    b=$4                 # ... buffer the 4th field of the header 
    while(getline==1) {  # then read while there are records to read
        s+=$3            # sum up the values in the 3rd field
        c++              # count the number of values, add if($3!="") if needed
    } 
    print b, s/c         # after while output header and average
}

Upvotes: 0

mklement0
mklement0

Reputation: 440556

Try:

awk 'NR==1 {time=$4;next} {sum+=$3} END {print time, (sum/(NR-1))}' $tmpn.data >>$tmpn.vrms
  • NR==1 {time=$4;next} is a pattern-action pair:

    • Pattern (condition) NR==1 is only true for the first input line.
    • Thus, action {time=$4;next} is only executed for the first line, and it stores the header's 4th field in variable time, then proceeds to the next record (line; next).
  • {sum+=$3}, which is processed for all remaining records (i.e, the data records), iteratively sums up the values in the 3rd field in variable sum.

  • END {print time, (sum/(NR-1))}:

    • The END block is executed after all input records have been processed.
    • {print time, (sum/(NR-1))} prints the header field and the average of the 3rd-field values, separated by the default output field separator (OFS), which is a space. Note that NR contains the total number of input records inside the END block.

A note on your solution attempt and awk's philosophy:

  • As (currently) stated, your command breaks, because you've enclosed the entire script in {...}.

  • Generally, awk's terse elegance comes from a sequence of carefully crafted pattern-action pairs.

    • A pattern is a condition (Boolean expression) that executes the associated action (a sequence of statements) only, if the condition is true.
    • Think of the pattern as the conditional part of an if statement with the "syntactic noise" removed, and the action as the body of that if statement:
      <pattern> { <action-cmd1>; ... } is (conceptually) short for if (<pattern>) { <action-cmd1>; ... }
  • In a given pair, you may either omit the action, or the pattern:

    • If you omit the pattern, the action is executed unconditionally (though the action may still not get to execute, if a previous pattern-action pair skipped further processing, such as with next or exit).

    • If you omit the action, the default action is { print }, i.e., to print the (potentially modified) current record.

      • This behavior is enables the common shorthand 1 to simply print the current record: 1 is a pattern that, in the Boolean context in which patterns are evaluated, is always true, and, in the absence of an associated action, the current record is printed by default.

Upvotes: 3

Related Questions