Reputation: 907
I have a 2-dimensional numpy array in python:
[[ 1 2 1 3 3]
[10 20 30 40 60]]
I would like to have unique values in the first row and adding the corresponding values in the second row together before deleting the columns. So, the output for my array would be this:
[[ 1 2 3 ]
[ 40 20 100 ]]
I'm a newbie to python and I can't think of efficient way doing this for larger scales.
Upvotes: 1
Views: 70
Reputation: 51425
Unfortunately, numpy
doesn't have a built-in groupby function (though there are ways to write them). If you're open to using pandas
, this would be straightforward:
import pandas as pd
>>> pd.DataFrame(a.T).groupby(0,as_index=False).sum().values.T
array([[ 1, 2, 3],
[ 40, 20, 100]])
Upvotes: 3
Reputation: 51185
You can use a sparse.csr_matrix
:
from scipy import sparse
b = a[0]
v = a[1]
m = b.max() + 1
s = v.shape[0]
res = sparse.csr_matrix((v, b, np.arange(s+1)), (s, m)).sum(0)
matrix([[ 0, 40, 20, 100]], dtype=int32)
This shows the sum of every value from 0-a[0].max()
in this case, so to link it back to your initial result:
t = np.unique(a[0])
np.stack((t, res.A1[t]))
array([[ 1, 2, 3],
[ 40, 20, 100]])
Upvotes: 0
Reputation: 19830
I don't think you'll get much more efficient than using a dictionary for the counts and then creating the array from that:
from collections import defaultdict
import numpy
sums = defaultdict(float)
arr = numpy.array([[ 1, 2, 1, 3, 3],
[10, 20, 30, 40, 60]]
for key, value in zip(*arr):
sums[key] += value
numpy.array(list(sums.items())).T
returns
array([[ 1., 2., 3.],
[ 40., 20., 100.]])
Upvotes: 0
Reputation: 979
a = np.array([[ 1, 2, 1, 3, 3],
[10, 20, 30, 40, 60]])
unique_values = np.unique(a[0])
new_array = np.zeros((2, len(unique_values)))
for i, uniq in enumerate(np.unique(a[0])):
new_array[0][i] = uniq
new_array[1][i] = np.where(a[0]==uniq,a[1],0).sum()
Upvotes: 0