Reputation: 12108
So I have a laaaaaaaarge file like this:
Item|Cost1|Cost2
Pizza|50|25
Sugar|100|100
Spices|100|200
Pizza|100|25
Sugar|200|100
Pizza|50|100
I want to add all Cost1
s and Cost2
s for a particular item and produce a merged output.
I've written a python code to do this,
item_dict = {}
for line in file:
fields = line.split('|')
item = fields[0]
cost1 = fields[1]
cost2 = fields[2]
if item_dict.has_key(item):
item_dict[item][0] += int(cost1)
item_dict[item][1] += int(cost2)
else:
item_dict[item] = [int(cost1),int(cost2)]
for key, val in item_dict.items():
print key,"|".join(val)
Is there anyway to do this very efficiently and quickly in awk or using any other wizardry?
Or can I make my python more elegant and faster?
Expected Output
Pizza|200|150
Sugar|300|200
Spices|100|200
Upvotes: 2
Views: 280
Reputation: 203209
awk '
BEGIN { FS=OFS="|" }
NR==1 { expectedNF = NF; next }
NF != expectedNF { print "Fix your #%@#&! data, idiot!"; exit 1 }'
{
items[$1]
for (c=2;c<=NF;c++)
cost[$1,c] += $c
}
END {
for (i in items) {
printf "%s", i
for (c=2;c<=NF;c++)
printf "%s%s", OFS, cost[i,c]
print ""
}
}
' file
Feel free to compress it onto 1 or 2 lines as you see fit.
Upvotes: 4
Reputation: 85775
In practice I would have done what fedorqui did. For completeness however, this python
script should be faster than your original:
#!/usr/bin/env python
import fileinput
item_dict = {}
for line in fileinput.input():
if not fileinput.isfirstline():
fields = line.strip().split('|')
item = fields[0]
cost1 = int(fields[1])
cost2 = int(fields[2])
try:
item_dict[item][0] += cost1
item_dict[item][1] += cost2
except KeyError:
item_dict[item] = [cost1, cost2]
for key, val in item_dict.items():
print "%s|%s|%s" % (key,val[0],val[1])
Save the script to a file such as sumcols
and make it executable chmod +x sumcols
and run like:
$ ./sumcols file
Spices|100|200
Sugar|300|200
Pizza|200|150
Upvotes: 1
Reputation: 289505
Something like this...
$ awk 'BEGIN{OFS=FS="|"}
NR>1 {cost1[$1]+=$2; cost2[$1]+=$3}
END{ for (i in cost1) print i, cost1[i], cost2[i]}' file
Sugar|300|200
Spices|100|200
Pizza|200|150
BEGIN{OFS=FS="|"}
sets the (input & output) field separator to be |
.NR>1
means that we are going to do some actions for line number bigger than 1. This way we skip the header.cost1
and cost2
are arrays whose index is the first field and its value is the sum till that point.END {}
is something we do after reading the whole file. It consists in looping through the array and printing the values.Upvotes: 10