sirabhorn
sirabhorn

Reputation: 33

awk: calculating sum from values in single field with multiple delimiters

Related to another post I had... parsing a sql string for integer values with multiple delimiters, In which I say I can easily accomplish the same with UNIX tools (ahem). I found it a bit more messy than expected. I'm looking for an awk solution. Any suggestions on the following?

Here is my original post, paraphrased:

#

I want to use awk to parse data sourced from a flat file that is pipe delimited. One of the fields is sub-formatted as follows. My end state is to sum the integers within the field, but my question here is to see of ways to use awk to sum the numeric values in the field. The pattern of the sub-formatting will always be where the desired integers will be preceded by a tilde (~) and followed by an asterisk (*), except for the last one in field. The number of sub fields may vary too (my example has 5, but there could more or less). The 4 char TAG name is of no importance.

So here is a sample:

|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|

From this example, all I would want for processing is the final number of 219. Again, I can work on the sum part as a further step; just interested in getting the numbers.

#

My solution currently entails two awk statements. First using gsub to replace the '~' with a '*' delimiter in my target field, 77:

awk -F'|' 'BEGIN {OFS="|"} { gsub("~", "*", $77) ; print }' file_1 > file_2

My second awk statement is to calculate the numeric sums on the target field, 77, which is the last field, and replace it with the calculated value. It is built on the assumption that there will be no other asterisks (*) anywhere else in the file. I'm okay with that. It is working for most examples, but not others, and my gut tells me this isn't that robust of an answer. Any ideas? The suggestions on my other post for SQL were great, but I couldn't implement them for unrelated silly reasons.

awk -F'*' '{if (NF>=2) {s=0; for (i=1; i<=NF; i++) s=s+$i; print substr($1, 1, length($1)-4) s;} else print}' file_2 > file_3

Upvotes: 0

Views: 301

Answers (1)

Kent
Kent

Reputation: 195199

To get the sum (219) from your example, you can use this:

awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;print s}' file

or the following for 219.00 :

awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;printf "%.2f\n", s}' file

Upvotes: 2

Related Questions