Justin Lange
Justin Lange

Reputation: 907

Create unique row in 2D numpy array by adding corresponding values

I have a 2-dimensional numpy array in python:

[[ 1  2  1  3  3]
 [10 20 30 40 60]]

I would like to have unique values in the first row and adding the corresponding values in the second row together before deleting the columns. So, the output for my array would be this:

[[  1   2   3 ]
 [ 40  20 100 ]]

I'm a newbie to python and I can't think of efficient way doing this for larger scales.

Upvotes: 1

Views: 70

Answers (4)

sacuL
sacuL

Reputation: 51425

Unfortunately, numpy doesn't have a built-in groupby function (though there are ways to write them). If you're open to using pandas, this would be straightforward:

import pandas as pd

>>> pd.DataFrame(a.T).groupby(0,as_index=False).sum().values.T

array([[  1,   2,   3],
       [ 40,  20, 100]])

Upvotes: 3

user3483203
user3483203

Reputation: 51185

You can use a sparse.csr_matrix:

from scipy import sparse
b = a[0]
v = a[1]
m = b.max() + 1
s = v.shape[0]

res = sparse.csr_matrix((v, b, np.arange(s+1)), (s, m)).sum(0)

matrix([[  0,  40,  20, 100]], dtype=int32)

This shows the sum of every value from 0-a[0].max() in this case, so to link it back to your initial result:

t = np.unique(a[0])
np.stack((t, res.A1[t]))

array([[  1,   2,   3],
       [ 40,  20, 100]])

Upvotes: 0

chthonicdaemon
chthonicdaemon

Reputation: 19830

I don't think you'll get much more efficient than using a dictionary for the counts and then creating the array from that:

from collections import defaultdict
import numpy

sums = defaultdict(float)

arr = numpy.array([[ 1,  2,  1,  3,  3],
                   [10, 20, 30, 40, 60]]

for key, value in zip(*arr):
    sums[key] += value


numpy.array(list(sums.items())).T

returns

array([[  1.,   2.,   3.],
       [ 40.,  20., 100.]])

Upvotes: 0

onno
onno

Reputation: 979

a = np.array([[ 1,  2,  1,  3,  3],
              [10, 20, 30, 40, 60]])

unique_values = np.unique(a[0])
new_array = np.zeros((2, len(unique_values)))
for i, uniq in enumerate(np.unique(a[0])):

    new_array[0][i] = uniq
    new_array[1][i] = np.where(a[0]==uniq,a[1],0).sum()

Upvotes: 0

Related Questions