Add some row data of duplicates in Python

Question

Let's say, I have an array in Python that goes like this:

array=[[1,2,5,6],
       [1,3,6,7],
       [1,2,3,4],
       [2,3,9,8]]

, and I would like to make an array out of this that sums up the 3rd and 4th data information for duplicate 1st and 2nd. I.e the unique array should go like this:

[[1,2,8,10],
 [1,3,6,7],
 [2,3,9,8]]

Is there a way to do it? I'm sure numpy has a cool function that does it efficiently but I cannot find it.

user3483203 · Accepted Answer

Using the numpy_indexed library, which provides a vectorized grouping operation and plenty of other utility functions:

import numpy_indexed as npi

np.hstack(npi.group_by(arr[:, :2]).sum(arr[:, 2:]))

array([[ 1,  2,  8, 10],
       [ 1,  3,  6,  7],
       [ 2,  3,  9,  8]])

Add some row data of duplicates in Python

Answers (2)

Related Questions