user2642751
user2642751

Reputation: 43

comparing more than 2 columns in a csv file and create new column in the csv file

I have 2 csv files( 1.csv and 2.csv). Number of columns in both the files are same.

1.csv

Server,Path,I/P Backlog,O/P Backlog
Server1,Yes,3000,200
Server3,Yes,8,0
Server5,Yes,0,0


2.csv

Server,Path,I/P Backlog,O/P Backlog
Server1,Yes,3300,180
Server3,Yes,7,0
Server5,Yes,0,0
Server6,Yes,10,200

I have to compare 3 different columns in both csv.

IN above example, if server name is same, then find the deviation percent of I/P backlog and o/p backlog and create 2 separate columns for input trend and output trend.

if value is same: Nochange. if value has increased then it should increase % (up) if value has decreased: decrease % (down) if new row has been added: New if any row doesnt exist in second report , nothing needs to be done.

So my final sheet should show as below.

Server  Path    I/P Backlog Input Trend O/P Backlog Output Trend
Server1 Yes 3300    10%(up) 180 20%(down)
Server3 Yes 7   20%(down)   0   No change
Server5 Yes 0   No change   0   No change
Server6 Yes 10  New 200 New

Upvotes: 0

Views: 165

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92884

Extended awk solution:

awk 'function percent(v){ 
         if (v !~ /^[0-9]+(\.[0-9]+)?$/){ return v } 
         coef=1; dev="down"; 
         if (v >= 1){ coef=-1; dev="up" } 
         return sprintf("%d%(%s)",(100-v*100) * coef, dev) 
    }
    NR==1{ 
        $4=$4 FS "Input Trend"; 
        $6=$6 FS "Output Trend"; 
        print 
    }
    FNR==1{ next }{ k=$1 FS $2 }
    NR==FNR{ a[k]=$3 FS $4; next }
    k in a{ 
        split(a[k], vals); 
        if ($3==vals[1]) ip_diff="No change"; 
        else if (!vals[1]) ip_diff=$3"(up)"; 
        else ip_diff=$3/vals[1]; 
        if ($4==vals[2]) op_diff="No change"; 
        else if (!vals[2]) op_diff=$4"(up)"; 
        else op_diff=$4/vals[2]; 
        print k, $3, percent(ip_diff), $4, percent(op_diff); next 
    }
    { print $1,$2,$3,"New",$4,"New" }' f1.csv f2.csv

The output:

Server Path I/P Backlog Input Trend O/P Backlog Output Trend
Server1 Yes 3300 10%(up) 180 10%(down)
Server3 Yes 7 12%(down) 0 No change
Server5 Yes 0 No change 0 No change
Server6 Yes 10 New 200 New

Upvotes: 1

Related Questions