Reputation: 31
I got a csv file with that kind of informations :
2013 Cat.1 10 Structure1 Code1 34.10
2014 Cat.1 25 Structure1 Code1 254.24
2013 Cat.2 250 Structure1 Code1 2456.4
2014 Cat.2 234 Structure1 Code1 2345.9
2013 Cat.1 5 Structure2 Code2 59
2013 Cat.1 1 Structure2 Code2 18
2014 Cat.1 8 Structure2 Code2 123
2014 Cat.1 1 Structure2 Code2 18
2013 Cat.2 64 Structure2 Code2 59
2013 Cat.2 8 Structure2 Code2 18
2014 Cat.2 70 Structure2 Code2 123
2014 Cat.2 11 Structure2 Code2 18
and the result file I would like is that kind :
2013 Cat.1 10 Structure1 Code1 34.10
2014 Cat.1 25 Structure1 Code1 254.24
2013 Cat.2 250 Structure1 Code1 2456.4
2014 Cat.2 234 Structure1 Code1 2345.9
2013 Cat.1 6 (5+1) Structure2 Code2 77 (59+18)
2014 Cat.1 9 (8+1) Structure2 Code2 141 (123+18)
2013 Cat.2 72 (64+8) Structure2 Code2 77 (59+18)
2014 Cat.2 81 (70+11) Structure2 Code2 141 (123+18)
Is this possible using awk? I only have 2 different fields on this example for the second structure, but could be much more...
I'm very new to programming and awk in particular.
Thanks for any answer!
Upvotes: 0
Views: 90
Reputation: 61
This one-liner will do the job:
awk 'BEGIN{g=1;s="%4s %5s %-12s %10s %5s %-12s\n"} f{printf s ,$1,$2,$3+a" ("a"+"$3")",$4,$5,$6+b" ("b"+"$6")";f=0;g=0} /Structure2/{a=$3;b=$6;f=g;g=1} /Structure1/{printf s,$1,$2,$3,$4,$5,$6}' file
2013 Cat.1 10 Structure1 Code1 34.10
2014 Cat.1 25 Structure1 Code1 254.24
2013 Cat.2 250 Structure1 Code1 2456.4
2014 Cat.2 234 Structure1 Code1 2345.9
2013 Cat.1 6 (5+1) Structure2 Code2 77 (59+18)
2014 Cat.1 9 (8+1) Structure2 Code2 141 (123+18)
2013 Cat.2 72 (64+8) Structure2 Code2 77 (59+18)
2014 Cat.2 81 (70+11) Structure2 Code2 141 (123+18)
I added a formatting for the alignment, I've used 12 (%-12s
) for the third and the sixth columns - you can increase it if the numbers go higher.
Upvotes: 0
Reputation: 3055
Here is a possible answer:
awk 'BEGIN{FS="[ ]+"; OFS="\t";}
NR==FNR{
key = $1"-"$2"-"$4"-"$5
idx[key] = idx[key]+1
a[key][idx[key]] = $3
c[key][idx[key]] = $6
}
NR!=FNR{
key = $1"-"$2"-"$4"-"$5
if(idx[key]==1){$1=$1; print ;next;}
if(idx[key]<0){next;}
line1 =" ("a[key][1]
line2 =" ("c[key][1]
sum1 = a[key][1]
sum2 = c[key][1]
for(i = 2; i< idx[key]; i++) {
line1 = line1"+"a[key][i]
line2 = line2"+"c[key][i]
sum1 = sum1+a[key][i]
sum2 = sum1+c[key][i]
}
sum1 = sum1 + a[key][idx[key]]
sum2 = sum2 + c[key][idx[key]]
line1 = sum1""line1"+"a[key][idx[key]]")"
line2 = sum2""line2"+"c[key][idx[key]]")"
print $1, $2, line1, $4, $5, line2
idx[key] = -1
}' inputFile inputFile
In this script, one ore more blanks are interpreted as field separators (
FS="[ ]+"
). In the output, fields are separated by a tab (OFS="\t"
).
Note that the script is called with two times inputFile
as argument.
If your input really is a csv-file, then try exporting it with ,
as field separators and set FS=OFS=","
.
Example output for the input given in the question:
2013 Cat.1 10 Structure1 Code1 34.10
2014 Cat.1 25 Structure1 Code1 254.24
2013 Cat.2 250 Structure1 Code1 2456.4
2014 Cat.2 234 Structure1 Code1 2345.9
2013 Cat.1 6 (5+1) Structure2 Code2 77 (59+18)
2014 Cat.1 9 (8+1) Structure2 Code2 141 (123+18)
2013 Cat.2 72 (64+8) Structure2 Code2 77 (59+18)
2014 Cat.2 81 (70+11) Structure2 Code2 141 (123+18)
Upvotes: 0
Reputation: 246799
Another awk answer, GNU awk specific. I assume you don't actually want to print out the addition formula.
gawk '
{ data[$1 OFS $2][$4 OFS $5][1] += $3
data[$1 OFS $2][$4 OFS $5][2] += $6 }
END {
for (k1 in data) {
for (k2 in data[k1]) {
print k1, data[k1][k2][1], k2, data[k1][k2][2]
}
}
}
' | sort -k4,5 -k2,2 -k1,1 | column -t
2013 Cat.1 10 Structure1 Code1 34.1
2014 Cat.1 25 Structure1 Code1 254.24
2013 Cat.2 250 Structure1 Code1 2456.4
2014 Cat.2 234 Structure1 Code1 2345.9
2013 Cat.1 6 Structure2 Code2 77
2014 Cat.1 9 Structure2 Code2 141
2013 Cat.2 72 Structure2 Code2 77
2014 Cat.2 81 Structure2 Code2 141
Upvotes: 1
Reputation: 67467
awk
to the rescue!
Not the full solution but may give you ideas
$awk '{
k = $1 FS $2 FS $4 FS $5
a[k] += $3
as[k] = as[k] ? as[k] "+" $3 : "(" $3
b[k] += $6
bs[k] = bs[k] ? bs[k] "+" $6 : "(" $6
}
END {
for (k in a) {
print k, a[k], as[k] ")", b[k], bs[k] ")"
}
}' file
will give you
2014 Cat.2 Structure2 Code2 81 (70+11) 141 (123+18)
2014 Cat.1 Structure2 Code2 9 (8+1) 141 (123+18)
2014 Cat.2 Structure1 Code1 234 (234) 2345.9 (2345.9)
2014 Cat.1 Structure1 Code1 25 (25) 254.24 (254.24)
2013 Cat.2 Structure2 Code2 72 (64+8) 77 (59+18)
2013 Cat.1 Structure2 Code2 6 (5+1) 77 (59+18)
2013 Cat.2 Structure1 Code1 250 (250) 2456.4 (2456.4)
2013 Cat.1 Structure1 Code1 10 (10) 34.1 (34.10)
Note that the column order changed to reuse k
and single entry values are also wrapped with parans. Both can be handled with little effort.
Upvotes: 2