Bach
Bach

Reputation: 6217

Sum and aggregate over a list with equal values

I have a pair lists of the same length, the first containing int values and the second contains float values. I wish to replace these with another pair of lists which may be shorter, but still have the same length, in which the first list will contain only unique values, and the second list will contain the sums for each matching value. That is, if the i'th element of the first list in the new pair is x, and the indices in the first list of the original pair in which x has appeared are i_1,...,i_k, then the i'th element of the second list in the new pair should contain the sum of the values in indices i_1,...,i_k in the second list of the original pair.

An example will clarify.

Input:

([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])

Ourput:

([1, 2, 3], [1.0, 0.5, 1.0])

I was trying to do some list comprehension trick here but failed. I can write a silly loop function for that, but I believe there should be something much nicer here.

Upvotes: 3

Views: 1663

Answers (3)

Build a map with the keys:

la,lb = ([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
m = {k:0.0 for k in la}

And fill it with the summations:

for i in xrange(len(lb)):
    m[la[i]] += lb[i]

Finally, from your map:

zip(*[(k,m[k]) for k in m]*1)

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250981

Not a one-liner, but since you've not posted your solution I'll suggest this solution that is using collections.OrderedDict:

>>> from collections import OrderedDict
>>> a, b = ([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
>>> d = OrderedDict()
>>> for k, v in zip(a, b):
...     d[k] = d.get(k, 0) + v
...     
>>> d.keys(), d.values()
([1, 2, 3], [1.0, 0.5, 1.0])

Of course if order doesn't matter then it's better to use collections.defaultdict:

>>> from collections import defaultdict
>>> a, b = ([1, 'foo', 'foo', 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
>>> d = defaultdict(int)
>>> for k, v in zip(a, b):
    d[k] +=  + v
...     
>>> d.keys(), d.values()
([3, 1, 'foo'], [1.0, 1.0, 0.5])

Upvotes: 3

behzad.nouri
behzad.nouri

Reputation: 77961

one way to go is using pandas:

>>> import pandas as pd
>>> df = pd.DataFrame({'tag':[1, 2, 2, 1, 1, 3], 
                       'val':[0.1, 0.2, 0.3, 0.4, 0.5, 1.0]})
>>> df
   tag  val
0    1  0.1
1    2  0.2
2    2  0.3
3    1  0.4
4    1  0.5
5    3  1.0
>>> df.groupby('tag')['val'].aggregate('sum')
tag
1      1.0
2      0.5
3      1.0
Name: val, dtype: float64

Upvotes: 3

Related Questions