Reputation: 1297
I have the following
(Decimal('1.000'), Decimal('419.760000'), Decimal('4.197600000'), Decimal('423.957600000'))
(Decimal('1.000'), Decimal('62.370000'), Decimal('0.623700000'), Decimal('62.993700000'))
(Decimal('2.000'), Decimal('7.920000'), Decimal('0.079200000'), Decimal('7.999200000'))
And I'd like to group them by the first column and sum the other columns (sum grouped by the first column, summarized for each column separately)...but I don't know how to do that...
I'm new to python...any pointers?
Thanks, BR
Upvotes: 0
Views: 2411
Reputation: 50995
Assuming those three tuples are items in a tuple or list numbers
:
column_sums = [sum(items) for items in zip(*numbers)]
Rereading your question, I think you may instead mean you want to group all numbers except the first of each row by what the number in the first row is, and then get the sum of each group. If so, do it like this:
from collections import defaultdict
grouped = defaultdict(list)
for tpl in numbers:
grouped[tpl[0]].extend(tpl[1:])
group_sums = dict((key, sum(lst)) for key, lst in grouped.items())
If you don't need the itermediate grouped
variable, you may optimize like this:
group_sums = defaultdict(int)
for tpl in numbers:
group_sums[tpl[0]] += sum(tpl[1:])
Re: comment
This would have been so much easier if you gave an example of the output you wanted in the first place. For example, you could have added this to your post:
From the above example I want this output:
{Decimal('1.000'): [ Decimal('482.130000'), Decimal('4.821300000'), Decimal('486.951300000')], Decimal('2.000'): [ Decimal('7.920000'), Decimal('0.079200000'), Decimal('7.999200000')]}
Then I could have posted this answer right away:
from itertools import izip_longest
group_sums = {}
for tpl in numbers:
previous_sum = group_sums.get(tpl[0], [])
iterator = izip_longest(previous_sum, tpl[1:], fillvalue=0)
group_sums[tpl[0]] = [prev + num for prev, num in iterator]
This also works if the number of columns varies within a group. Please tell me I've understood the question correctly this time. :)
Upvotes: 3
Reputation: 29113
If the following input given:
inputData = [[1,2,3],
[1,3,4],
[5,6,7]]
And you expect to get grouped sum - e.g sum of first two(grouped by 1 from first col) rows and the third row itself(since there is no more rows with 5 within first column) then you may use the following code:
res = []
for i,val in enumerate(zip(*inputData )[0]):# first column
filtered = filter(lambda x: x[0] == val, inputData)
(not filtered in res and res.append(filtered))
print map(lambda comb: map(sum, zip(*comb)), res)
Upvotes: 0