Reputation: 156
Let's say, I have an array in Python that goes like this:
array=[[1,2,5,6],
[1,3,6,7],
[1,2,3,4],
[2,3,9,8]]
, and I would like to make an array out of this that sums up the 3rd and 4th data information for duplicate 1st and 2nd. I.e the unique array should go like this:
[[1,2,8,10],
[1,3,6,7],
[2,3,9,8]]
Is there a way to do it? I'm sure numpy has a cool function that does it efficiently but I cannot find it.
Upvotes: 0
Views: 47
Reputation: 51175
Using the numpy_indexed
library, which provides a vectorized grouping operation and plenty of other utility functions:
import numpy_indexed as npi
np.hstack(npi.group_by(arr[:, :2]).sum(arr[:, 2:]))
array([[ 1, 2, 8, 10],
[ 1, 3, 6, 7],
[ 2, 3, 9, 8]])
Upvotes: 2
Reputation: 164823
If you aren't concerned about performance, Pandas offers intuitive syntax:
import numpy as np, pandas as pd
A = np.array([[1,2,5,6],
[1,3,6,7],
[1,2,3,4],
[2,3,9,8]])
res = pd.DataFrame(A).groupby([0, 1], sort=False).sum()\
.reset_index().values
print(res)
array([[ 1, 2, 8, 10],
[ 1, 3, 6, 7],
[ 2, 3, 9, 8]], dtype=int64)
Upvotes: 1