Stuff Compiler
Stuff Compiler

Reputation: 49

In a CSV file, subtotal 2 columns based on a third one, using AWK in KSH

Disclaimers:

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.

The "Problem":

I have an CSV file that looks like this:

c1,c2,c3,c4,c5,134.6,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0.18,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,416.09,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,12.1,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,480.64,,c8,c9,SERVER4,c11
c1,c2,c3,c4,c5,,83.65,c8,c9,SERVER5,c11
c1,c2,c3,c4,c5,,253.15,c8,c9,SERVER6,c11
c1,c2,c3,c4,c5,,18.84,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,8.12,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,22.45,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,117.81,c8,c9,SERVER8,c11
c1,c2,c3,c4,c5,,96.34,c8,c9,SERVER9,c11

Complementary facts:

    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.

What I need as an output:

    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.

Output sample, using the sample above as an input:

SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96

I've already found within these pages not one, but two AWK oneliners that PARTIALLY accomplish what it need:

awk -F "," 'NR==1{last=$10; sum=0;}{if (last != $10) {print last "," sum; last=$10; sum=0;} sum += $6;}END{print last "," sum;}' inputfile


awk -F, '{a[$10]+=$6;}END{for(i in a)print i","a[i];}' inputfile

My "problems" in both cases are the same:

    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.

Thanks in advance for your support people! Regards, Martín

Upvotes: 0

Views: 135

Answers (2)

Ed Morton
Ed Morton

Reputation: 203655

Your posted expected output doesn't seem to match your posted sample input so we're guessing but this might be what you're looking for:

$ cat tst.awk
BEGIN { FS=OFS="," }
$10 != prev {
    if (NR > 1) {
        print prev, sum6, sum7
    }
    sum6 = sum7 = ""
    prev = $10
}
$6  { sum6 += $6 }
$7  { sum7 += $7 }
END { print prev, sum6, sum7 }

$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

Upvotes: 1

karakfa
karakfa

Reputation: 67507

something like this?

$ awk 'BEGIN{FS=OFS=","} 
            {s6[$10]+=$6; s7[$10]+=$7} 
         END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort

SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

note that your treatment of commas is not consistent, you're adding an extra one when the last field is zero (count the commas)

Upvotes: 1

Related Questions