Reputation: 2238
I have a numpy 2D array as follows
gona = array([['a1', 3], ['a2', 5], ['a3', 1], ['a3', 2], ['a3', 1], ['a1', 7]])
This array has 2 columns
What I want to do is create an array with 2 columns. Column 1 should have 'a1' , 'a2', 'a3' values in its' rows and column 2 should have summation of those corresponding values.
new_gona = array([['a1', 10], ['a2', 5], ['a3', 4]])
Here, corresponding values are taken as follows.
'a1' : 3 + 7 = 10
'a2' : 5
'a3' : 1 + 2 + 1 = 4
What would be an easy method to achieve this?
Upvotes: 0
Views: 701
Reputation: 5373
Then, the list comprehension will do it pretty easy:
def fst(x): return x[0]
[(a, sum([int(m[1]) for m in gona if a == m[0]])) for a in set(map(fst, gona)) ]
This is basic Python. No libraries involved. The first function is defined only avoid the lambda expression in the map
at the end. Both the Pandas and the NumPy solutions already mentioned seem pretty interesting though. +1 for both!
Upvotes: 0
Reputation: 67427
A numpy only solution:
>>> labels, indices = np.unique(gona[:, 0], return_inverse=True)
>>> sums = np.bincount(indices, weights=gona[:, 1].astype(np.float))
>>> new_gona = np.column_stack((labels, sums))
>>> new_gona
array([['a1', '10'],
['a2', '5.'],
['a3', '4.']],
dtype='|S2')
Upvotes: 2
Reputation: 2755
Use pandas and its indexing magic:
import pandas as pd
import numpy as np
gona = np.array([['a1', 3], ['a2', 5], ['a3', 1],
['a3', 2], ['a3', 1], ['a1', 7]])
# Create series where second items are data and first items are index
series = pd.Series(gona[:,1],gona[:,0],dtype=np.float)
# Compute sums across index
sums = series.sum(level=0)
# Construct new array in the format you want
new_gona = np.array(zip(sums.index,sums.values))
new_gona
# out[]:
# array([['a1', '10.0'],
# ['a2', '5.0'],
# ['a3', '4.0']],
# dtype='|S4')
It's also notable that np.array
s can only hold one datatype. So your mixing of strings and numeric types needs to be corrected for by specifying dtype=np.float
. You can use np.int
if you want.
Upvotes: 3
Reputation: 34698
from collections import defaultdict
from operator import itemgetter
sums = defaultdict(int)
for key, value in gona:
sums[key] += value
new_gona = sorted(sums.iteritems(), key=itemgetter(0))
Cheat?
Upvotes: 1
Reputation: 3109
you have to write a loop around gona and store the (a1) as a key in dictionary object. The value should be added ofcourse
Upvotes: -2