Mission
Mission

Reputation: 1297

list group and sum by column

I have the following

    (Decimal('1.000'), Decimal('419.760000'), Decimal('4.197600000'), Decimal('423.957600000'))             
(Decimal('1.000'), Decimal('62.370000'), Decimal('0.623700000'), Decimal('62.993700000'))           
(Decimal('2.000'), Decimal('7.920000'), Decimal('0.079200000'), Decimal('7.999200000'))

And I'd like to group them by the first column and sum the other columns (sum grouped by the first column, summarized for each column separately)...but I don't know how to do that...

I'm new to python...any pointers?

Thanks, BR

Upvotes: 0

Views: 2411

Answers (2)

Lauritz V. Thaulow
Lauritz V. Thaulow

Reputation: 50995

Assuming those three tuples are items in a tuple or list numbers:

column_sums = [sum(items) for items in zip(*numbers)]

Rereading your question, I think you may instead mean you want to group all numbers except the first of each row by what the number in the first row is, and then get the sum of each group. If so, do it like this:

from collections import defaultdict

grouped = defaultdict(list)

for tpl in numbers:
    grouped[tpl[0]].extend(tpl[1:])

group_sums = dict((key, sum(lst)) for key, lst in grouped.items())

If you don't need the itermediate grouped variable, you may optimize like this:

group_sums = defaultdict(int)

for tpl in numbers:
    group_sums[tpl[0]] += sum(tpl[1:])

Re: comment

This would have been so much easier if you gave an example of the output you wanted in the first place. For example, you could have added this to your post:

From the above example I want this output:

{Decimal('1.000'): [
   Decimal('482.130000'), Decimal('4.821300000'), Decimal('486.951300000')],
Decimal('2.000'): [
   Decimal('7.920000'), Decimal('0.079200000'), Decimal('7.999200000')]}

Then I could have posted this answer right away:

from itertools import izip_longest

group_sums = {}

for tpl in numbers:
    previous_sum = group_sums.get(tpl[0], [])
    iterator = izip_longest(previous_sum, tpl[1:], fillvalue=0)
    group_sums[tpl[0]] = [prev + num for prev, num in iterator]

This also works if the number of columns varies within a group. Please tell me I've understood the question correctly this time. :)

Upvotes: 3

Artsiom Rudzenka
Artsiom Rudzenka

Reputation: 29113

If the following input given:

inputData = [[1,2,3],
             [1,3,4],
             [5,6,7]]

And you expect to get grouped sum - e.g sum of first two(grouped by 1 from first col) rows and the third row itself(since there is no more rows with 5 within first column) then you may use the following code:

res = []
for i,val in enumerate(zip(*inputData )[0]):# first column
   filtered = filter(lambda x: x[0] == val, inputData)
   (not filtered in res and res.append(filtered))

print map(lambda comb: map(sum, zip(*comb)), res)

Upvotes: 0

Related Questions