Pleb
Pleb

Reputation: 31

calculating on specific fields with awk

I got a csv file with that kind of informations :

2013  Cat.1  10  Structure1  Code1  34.10
2014  Cat.1  25  Structure1  Code1  254.24
2013  Cat.2  250 Structure1  Code1  2456.4
2014  Cat.2  234 Structure1  Code1  2345.9
2013  Cat.1  5   Structure2  Code2  59
2013  Cat.1  1   Structure2  Code2  18
2014  Cat.1  8   Structure2  Code2  123
2014  Cat.1  1   Structure2  Code2  18
2013  Cat.2  64  Structure2  Code2  59
2013  Cat.2  8   Structure2  Code2  18
2014  Cat.2  70  Structure2  Code2  123
2014  Cat.2  11  Structure2  Code2  18

and the result file I would like is that kind :

2013  Cat.1  10         Structure1  Code1  34.10
2014  Cat.1  25         Structure1  Code1  254.24
2013  Cat.2  250        Structure1  Code1  2456.4
2014  Cat.2  234        Structure1  Code1  2345.9
2013  Cat.1  6 (5+1)    Structure2  Code2  77 (59+18)
2014  Cat.1  9 (8+1)    Structure2  Code2  141 (123+18)
2013  Cat.2  72 (64+8)  Structure2  Code2  77 (59+18)
2014  Cat.2  81 (70+11) Structure2  Code2  141 (123+18)

Is this possible using awk? I only have 2 different fields on this example for the second structure, but could be much more...

I'm very new to programming and awk in particular.

Thanks for any answer!

Upvotes: 0

Views: 90

Answers (4)

Gilad Merran
Gilad Merran

Reputation: 61

This one-liner will do the job:

awk 'BEGIN{g=1;s="%4s %5s %-12s %10s %5s %-12s\n"} f{printf s ,$1,$2,$3+a" ("a"+"$3")",$4,$5,$6+b" ("b"+"$6")";f=0;g=0} /Structure2/{a=$3;b=$6;f=g;g=1} /Structure1/{printf s,$1,$2,$3,$4,$5,$6}' file

2013  Cat.1  10         Structure1  Code1  34.10
2014  Cat.1  25         Structure1  Code1  254.24
2013  Cat.2  250        Structure1  Code1  2456.4
2014  Cat.2  234        Structure1  Code1  2345.9
2013  Cat.1  6 (5+1)    Structure2  Code2  77 (59+18)
2014  Cat.1  9 (8+1)    Structure2  Code2  141 (123+18)
2013  Cat.2  72 (64+8)  Structure2  Code2  77 (59+18)
2014  Cat.2  81 (70+11) Structure2  Code2  141 (123+18)

I added a formatting for the alignment, I've used 12 (%-12s) for the third and the sixth columns - you can increase it if the numbers go higher.

Upvotes: 0

F. Knorr
F. Knorr

Reputation: 3055

Here is a possible answer:

awk 'BEGIN{FS="[ ]+"; OFS="\t";}
    NR==FNR{
        key = $1"-"$2"-"$4"-"$5
        idx[key] = idx[key]+1
        a[key][idx[key]] = $3
        c[key][idx[key]] = $6
    }
    NR!=FNR{
        key = $1"-"$2"-"$4"-"$5
        if(idx[key]==1){$1=$1; print ;next;}
        if(idx[key]<0){next;}
        line1 =" ("a[key][1]
        line2 =" ("c[key][1]
        sum1 = a[key][1]
        sum2 = c[key][1]
        for(i = 2; i< idx[key]; i++) {
            line1 = line1"+"a[key][i]
            line2 = line2"+"c[key][i]
            sum1 = sum1+a[key][i]
            sum2 = sum1+c[key][i]
        }
        sum1 = sum1 + a[key][idx[key]]
        sum2 = sum2 + c[key][idx[key]]
        line1 = sum1""line1"+"a[key][idx[key]]")" 
        line2 = sum2""line2"+"c[key][idx[key]]")" 
        print $1, $2, line1, $4, $5, line2
        idx[key] = -1
    }' inputFile inputFile

In this script, one ore more blanks are interpreted as field separators (FS="[ ]+"). In the output, fields are separated by a tab (OFS="\t").
Note that the script is called with two times inputFile as argument.
If your input really is a csv-file, then try exporting it with , as field separators and set FS=OFS=",".
Example output for the input given in the question:

2013    Cat.1   10  Structure1  Code1   34.10
2014    Cat.1   25  Structure1  Code1   254.24
2013    Cat.2   250 Structure1  Code1   2456.4
2014    Cat.2   234 Structure1  Code1   2345.9
2013    Cat.1   6 (5+1) Structure2  Code2   77 (59+18)
2014    Cat.1   9 (8+1) Structure2  Code2   141 (123+18)
2013    Cat.2   72 (64+8)   Structure2  Code2   77 (59+18)
2014    Cat.2   81 (70+11)  Structure2  Code2   141 (123+18)

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246799

Another awk answer, GNU awk specific. I assume you don't actually want to print out the addition formula.

gawk '
  { data[$1 OFS $2][$4 OFS $5][1] += $3
    data[$1 OFS $2][$4 OFS $5][2] += $6 }
  END {
    for (k1 in data) {
      for (k2 in data[k1]) {
        print k1, data[k1][k2][1], k2, data[k1][k2][2]
      }
    }
  }
' | sort -k4,5 -k2,2 -k1,1 | column -t
2013  Cat.1  10   Structure1  Code1  34.1
2014  Cat.1  25   Structure1  Code1  254.24
2013  Cat.2  250  Structure1  Code1  2456.4
2014  Cat.2  234  Structure1  Code1  2345.9
2013  Cat.1  6    Structure2  Code2  77
2014  Cat.1  9    Structure2  Code2  141
2013  Cat.2  72   Structure2  Code2  77
2014  Cat.2  81   Structure2  Code2  141

Upvotes: 1

karakfa
karakfa

Reputation: 67467

awk to the rescue!

Not the full solution but may give you ideas

$awk  '{
    k = $1 FS $2 FS $4 FS $5
    a[k] += $3
    as[k] = as[k] ? as[k] "+" $3 : "(" $3
    b[k] += $6
    bs[k] = bs[k] ? bs[k] "+" $6 : "(" $6
  }

  END {
    for (k in a) {
      print k, a[k], as[k] ")", b[k], bs[k] ")"
    }
  }' file

will give you

2014 Cat.2 Structure2 Code2 81 (70+11) 141 (123+18)
2014 Cat.1 Structure2 Code2 9 (8+1) 141 (123+18)
2014 Cat.2 Structure1 Code1 234 (234) 2345.9 (2345.9)
2014 Cat.1 Structure1 Code1 25 (25) 254.24 (254.24)
2013 Cat.2 Structure2 Code2 72 (64+8) 77 (59+18)
2013 Cat.1 Structure2 Code2 6 (5+1) 77 (59+18)
2013 Cat.2 Structure1 Code1 250 (250) 2456.4 (2456.4)
2013 Cat.1 Structure1 Code1 10 (10) 34.1 (34.10)

Note that the column order changed to reuse k and single entry values are also wrapped with parans. Both can be handled with little effort.

Upvotes: 2

Related Questions