VNA
VNA

Reputation: 625

awk running total count and sum (Cont)

In continuation of previous post , how to calculate 80%-20% rule contribution of vendors on Daily basis ($1) AND Region ($1) wise.

The input file is alredy sorted based on Date & Region and Amount from highest to lowest

Input.csv

Date,Region,Vendor,Amount
5-Apr-15,east,aa,123
5-Apr-15,east,bb,50
5-Apr-15,east,cc,15
5-Apr-15,south,dd,88
5-Apr-15,south,ee,40
5-Apr-15,south,ff,15
5-Apr-15,south,gg,10
7-Apr-15,east,ii,90
7-Apr-15,east,jj,20

In the above input, based on Date($1) AND Region ($2) field need to populate Running Sum of Amount then calculate percentage of Running Sum of Amount for the day & Region

Date,Region,Vendor,Amount,RunningSum,%RunningSum
5-Apr-15,east,aa,123,123,65%
5-Apr-15,east,bb,50,173,92%
5-Apr-15,east,cc,15,188,100%

5-Apr-15,south,dd,88,88,58%
5-Apr-15,south,ee,40,128,84%
5-Apr-15,south,ff,15,143,93%
5-Apr-15,south,gg,10,153,100%

7-Apr-15,east,ii,90,90,82%
7-Apr-15,east,jj,20,110,100%

Once it is derived 80% or first hit of 80%above need to consider as 80% contribution remaining line items need to be consider as 20% contribution.

Date,Region,Countof80%Vendor, SumOf80%Vendor, Countof20%Vendor, SumOf20%Vendor
5-Apr-15,east,2,173,1,15
5-Apr-15,south,2,128,2,25
7-Apr-15,east,1,90,1,20

Upvotes: 0

Views: 177

Answers (1)

ShellFish
ShellFish

Reputation: 4551

This awk script will help you do the first part, ask if you need clarification. Basically it stores the values in arrays and prints out the requested info after parsing the document.

awk -F',' 'BEGIN{OFS=FS}
    NR==1{print $0, "RunningSum", "%RunningSum"}
    NR!=1{  
        if (date == $1 && region == $2) {
            counts[i]++
            cities[i][counts[i]] = $3
            amounts[i][counts[i]] = $4
            rsum[i][counts[i]] = rsum[i][counts[i] - 1] + $4
        } else {
            date = $1; region = $2
            dates[++i] = $1 
            regions[i] = $2
            counts[i] = 1
            cities[i][1] = $3
            amounts[i][1] = $4
            rsum[i][1] = $4
        }
    }
    END{
        for(j=1; j<=i; j++) {
            total = rsum[j][counts[j]];
            for (k=1; k<=counts[j]; k++) {
                print dates[j], regions[j], cities[j][k], amounts[j][k], rsum[j][k], int(rsum[j][k]/total*100) "%"
            }
            if (j != i) { print "" }
        }
    }' yourfilename

The second part can be done like this (using the output of the first awk script):

awk -F'[,%]' 'BEGIN{ OFS="," }
    NR==1 || $0 ~ /^$/ {
        over = ""
        record = 1
    }
    ! (NR==1 || $0 ~ /^$/) {
        if (record) {
            dates[++i] = $1
            regions[i] = $2
            record = ""
        }
        if (over) {
            twenty[i]++
            twenties[i] += $4
        } else {
            eighty[i]++
            eighties[i] += $4
        }
        if ($6 >= 80) {
            over = 1
        }
    }
    END {
        print "Date","Region","Countof80%Vendor", "SumOf80%Vendor", "Countof20%Vendor", "SumOf20%Vendor"
        for (j=1; j<=i; j++) {
            print dates[j], regions[j], eighty[j], eighties[j], twenty[j], twenties[j]
        }
    }' output/file/of/first/script

Upvotes: 1

Related Questions