Reputation: 4102
I have a ndarray like this:
data = [(1,"YES", 54.234),
(1,"YES", 1.0001),
(2,"YES", 4.234),
(3,"YES", 0.234)]
dtypes = [("GROUPID", np.int),
("HASNEAR", "|S255"),
("DISTANCE", np.float64)]
array = np.array(data, dtype=dtypes)
Is there a way to group the data and return only the Minimum distance in each group in a new array?
In my example, I have 4 rows. After the group and return minimum, I would expect only 3 rows returned. One for each GROUPID value.
If numpy arrays aren't the right tool, could you do this in Pandas?
Thank you
Upvotes: 1
Views: 739
Reputation: 10759
AS illustrated by others, you can do this in pandas, but it is a relatively heavyweight abstraction that introduces all kinds of other complexities that you may or may not be interested in.
The numpy_indexed package specializes in these kind of operations in isolation:
import numpy_indexed as npi
npi.group_by(data['GROUPID']).min(data['DISTANCE'])
Upvotes: 2
Reputation: 1365
Create a pandas DataFrame, group by GROUPID and aggregate by min()
:
df = pd.DataFrame(data, columns=('GROUPID','HASNEAR','DISTANCE'))
df.groupby('GROUPID').min()
Upvotes: 2
Reputation: 393933
IIUC you can do this in pandas:
In [8]:
import pandas as pd
# construct a df
df = pd.DataFrame(array)
df
Out[8]:
GROUPID HASNEAR DISTANCE
0 1 b'YES' 54.2340
1 1 b'YES' 1.0001
2 2 b'YES' 4.2340
3 3 b'YES' 0.2340
You can now groupby
on GROUPID column, call idxmin
to return the index of the min value for the column of interest and use this to filter the orig df:
In [9]:
df.loc[df.groupby('GROUPID')['DISTANCE'].idxmin()]
Out[9]:
GROUPID HASNEAR DISTANCE
1 1 b'YES' 1.0001
2 2 b'YES' 4.2340
3 3 b'YES' 0.2340
You can see what idxmin
returns is the index of the min values:
In [10]:
df.groupby('GROUPID')['DISTANCE'].idxmin()
Out[10]:
GROUPID
1 1
2 2
3 3
Name: DISTANCE, dtype: int64
You can convert back to a numpy array by calling .values
:
In [11]:
df.loc[df.groupby('GROUPID')['DISTANCE'].idxmin()].values
Out[11]:
array([[1, b'YES', 1.0001],
[2, b'YES', 4.234],
[3, b'YES', 0.234]], dtype=object)
Upvotes: 1