Will
Will

Reputation: 4469

Gawk-ing an average always returns 0

I am new to Awk, and am having trouble despite this being a common and seemingly simple problem.

I am trying to get the average of a column, but my addition seems to be not working. My script:

BEGIN {FS = ","}
{
        AgentDC1 = $4;
        AgentDC2 = $5;
        AutoDC1 = $23;
        AutoDC2 = $24;
        CallDuration = $28;
        CallDurationMinutes = $27;
        CallStart = $33;
        ConnectTime = $35;

        num = (CallDuration ? CallDuration : 0)
        print num
        sum += num;
}
END {print sum;}

When run, it prints the values (in quotes, it this normal?) but then prints the average as 0 (without quotes). For example:

$ awk -f search.awk callrecords.csv
"644.0"
"149.0"
"397.0"
...
""
"117.0"
"165.0"
""
0

So empty slots are being printed as "", and nothing is being added to sum. I hate to post HOW DO questions, but I am really stuck here, none of the other SOs I found were illuminating.

Upvotes: 1

Views: 198

Answers (2)

rici
rici

Reputation: 241881

I suppose that the quotes are actually present in the data file. Awk will not remove them magically.

In awk, when you use a variable as though it were a number, awk just ignores the characters in the variable, starting with the first one which can't be part og a number. If nothing is left of the variable's value, awk uses the value 0.

Assuming that all your fields actually contain quotation marks, the value of num will start with a quote, so using it as a number will result in the value 0. It still prints out ok, because it is printed as a string.


Here is a gawk solution which can also deal with fields which contain commas. The FPAT regex was modified from the gawk manual, while the function fix was adapted from some code on the same page. Both assume the "normal" CSV convention that quotes in quoted fields are doubled. (As @EdMorton points out in a comment, embedded newlines will not be handled correctly.)

function fix(x) {
    if (substr(x, 1, 1) == "\"")
      return gensub(/""/, "\"", "g",
                    substr(x, 2, length(x) - 2))
    else
      return x
}
BEGIN {
    FPAT = "([^,\"][^,]*|(\"[^\"]*\")+)?
}
{
    AgentDC1 = fix($4)
    AgentDC2 = fix($5)
    AutoDC1 = fix($23)
    AutoDC2 = fix($24)
    CallDuration = fix($28)
    CallDurationMinutes = fix($27)
    CallStart = fix($33)
    ConnectTime = fix($35)
    # Unlike the original, this casts num to a number.
    # It's unnecessary. sum += CallDuration; would be just fine.
    num = CallDuration+0
    print num
    sum += num
}
END {print sum+0}

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 204229

You have quotes in your input data. Try this:

BEGIN {FS = "\"?,\"?"}
{
        gsub(/^"|"$/,"")
        AgentDC1 = $4
        AgentDC2 = $5
        AutoDC1 = $23
        AutoDC2 = $24
        CallDuration = $28
        CallDurationMinutes = $27
        CallStart = $33
        ConnectTime = $35

        num = (CallDuration ? CallDuration : 0)
        print num
        sum += num
}
END {print sum+0}

The above won't work if you have commas inside your fields.

Upvotes: 2

Related Questions