Reputation: 4469
I am new to Awk, and am having trouble despite this being a common and seemingly simple problem.
I am trying to get the average of a column, but my addition seems to be not working. My script:
BEGIN {FS = ","}
{
AgentDC1 = $4;
AgentDC2 = $5;
AutoDC1 = $23;
AutoDC2 = $24;
CallDuration = $28;
CallDurationMinutes = $27;
CallStart = $33;
ConnectTime = $35;
num = (CallDuration ? CallDuration : 0)
print num
sum += num;
}
END {print sum;}
When run, it prints the values (in quotes, it this normal?) but then prints the average as 0 (without quotes). For example:
$ awk -f search.awk callrecords.csv
"644.0"
"149.0"
"397.0"
...
""
"117.0"
"165.0"
""
0
So empty slots are being printed as ""
, and nothing is being added to sum. I hate to post HOW DO questions, but I am really stuck here, none of the other SOs I found were illuminating.
Upvotes: 1
Views: 198
Reputation: 241881
I suppose that the quotes are actually present in the data file. Awk will not remove them magically.
In awk, when you use a variable as though it were a number, awk just ignores the characters in the variable, starting with the first one which can't be part og a number. If nothing is left of the variable's value, awk uses the value 0.
Assuming that all your fields actually contain quotation marks, the value of num
will start with a quote, so using it as a number will result in the value 0. It still prints out ok, because it is printed as a string.
Here is a gawk
solution which can also deal with fields which contain commas. The FPAT
regex was modified from the gawk manual, while the function fix
was adapted from some code on the same page. Both assume the "normal" CSV convention that quotes in quoted fields are doubled. (As @EdMorton points out in a comment, embedded newlines will not be handled correctly.)
function fix(x) {
if (substr(x, 1, 1) == "\"")
return gensub(/""/, "\"", "g",
substr(x, 2, length(x) - 2))
else
return x
}
BEGIN {
FPAT = "([^,\"][^,]*|(\"[^\"]*\")+)?
}
{
AgentDC1 = fix($4)
AgentDC2 = fix($5)
AutoDC1 = fix($23)
AutoDC2 = fix($24)
CallDuration = fix($28)
CallDurationMinutes = fix($27)
CallStart = fix($33)
ConnectTime = fix($35)
# Unlike the original, this casts num to a number.
# It's unnecessary. sum += CallDuration; would be just fine.
num = CallDuration+0
print num
sum += num
}
END {print sum+0}
Upvotes: 2
Reputation: 204229
You have quotes in your input data. Try this:
BEGIN {FS = "\"?,\"?"}
{
gsub(/^"|"$/,"")
AgentDC1 = $4
AgentDC2 = $5
AutoDC1 = $23
AutoDC2 = $24
CallDuration = $28
CallDurationMinutes = $27
CallStart = $33
ConnectTime = $35
num = (CallDuration ? CallDuration : 0)
print num
sum += num
}
END {print sum+0}
The above won't work if you have commas inside your fields.
Upvotes: 2