Reputation: 6217
I have a pair lists of the same length, the first containing int
values and the second contains float
values. I wish to replace these with another pair of lists which may be shorter, but still have the same length, in which the first list will contain only unique values, and the second list will contain the sums for each matching value. That is, if the i'th element of the first list in the new pair is x
, and the indices in the first list of the original pair in which x
has appeared are i_1,...,i_k
, then the i'th element of the second list in the new pair should contain the sum of the values in indices i_1,...,i_k
in the second list of the original pair.
An example will clarify.
Input:
([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
Ourput:
([1, 2, 3], [1.0, 0.5, 1.0])
I was trying to do some list comprehension trick here but failed. I can write a silly loop function for that, but I believe there should be something much nicer here.
Upvotes: 3
Views: 1663
Reputation: 27702
Build a map with the keys:
la,lb = ([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
m = {k:0.0 for k in la}
And fill it with the summations:
for i in xrange(len(lb)):
m[la[i]] += lb[i]
Finally, from your map:
zip(*[(k,m[k]) for k in m]*1)
Upvotes: 1
Reputation: 250981
Not a one-liner, but since you've not posted your solution I'll suggest this solution that is using collections.OrderedDict
:
>>> from collections import OrderedDict
>>> a, b = ([1, 2, 2, 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
>>> d = OrderedDict()
>>> for k, v in zip(a, b):
... d[k] = d.get(k, 0) + v
...
>>> d.keys(), d.values()
([1, 2, 3], [1.0, 0.5, 1.0])
Of course if order doesn't matter then it's better to use collections.defaultdict
:
>>> from collections import defaultdict
>>> a, b = ([1, 'foo', 'foo', 1, 1, 3], [0.1, 0.2, 0.3, 0.4, 0.5, 1.0])
>>> d = defaultdict(int)
>>> for k, v in zip(a, b):
d[k] += + v
...
>>> d.keys(), d.values()
([3, 1, 'foo'], [1.0, 1.0, 0.5])
Upvotes: 3
Reputation: 77961
one way to go is using pandas
:
>>> import pandas as pd
>>> df = pd.DataFrame({'tag':[1, 2, 2, 1, 1, 3],
'val':[0.1, 0.2, 0.3, 0.4, 0.5, 1.0]})
>>> df
tag val
0 1 0.1
1 2 0.2
2 2 0.3
3 1 0.4
4 1 0.5
5 3 1.0
>>> df.groupby('tag')['val'].aggregate('sum')
tag
1 1.0
2 0.5
3 1.0
Name: val, dtype: float64
Upvotes: 3