M. Zidan
M. Zidan

Reputation: 156

Add some row data of duplicates in Python

Let's say, I have an array in Python that goes like this:

array=[[1,2,5,6],
       [1,3,6,7],
       [1,2,3,4],
       [2,3,9,8]]

, and I would like to make an array out of this that sums up the 3rd and 4th data information for duplicate 1st and 2nd. I.e the unique array should go like this:

[[1,2,8,10],
 [1,3,6,7],
 [2,3,9,8]]

Is there a way to do it? I'm sure numpy has a cool function that does it efficiently but I cannot find it.

Upvotes: 0

Views: 47

Answers (2)

user3483203
user3483203

Reputation: 51175

Using the numpy_indexed library, which provides a vectorized grouping operation and plenty of other utility functions:

import numpy_indexed as npi

np.hstack(npi.group_by(arr[:, :2]).sum(arr[:, 2:]))

array([[ 1,  2,  8, 10],
       [ 1,  3,  6,  7],
       [ 2,  3,  9,  8]])

Upvotes: 2

jpp
jpp

Reputation: 164823

If you aren't concerned about performance, Pandas offers intuitive syntax:

import numpy as np, pandas as pd

A = np.array([[1,2,5,6],
              [1,3,6,7],
              [1,2,3,4],
              [2,3,9,8]])

res = pd.DataFrame(A).groupby([0, 1], sort=False).sum()\
        .reset_index().values

print(res)

array([[ 1,  2,  8, 10],
       [ 1,  3,  6,  7],
       [ 2,  3,  9,  8]], dtype=int64)

Upvotes: 1

Related Questions