Reputation: 533
Suppose we have the data frame:
df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
'Max Speed' : [380.1, 370.3, 24.77, -12.55]})
I have to construct a function similar to the minimum in absolute value, it must return the element closer to zero. By grouping by 'Animal', in our case, it should return:
Animal Max Speed
0 Falcon 370.30
1 Parrot -12.55
I tried a function like this:
def nearzero():
absolute = [abs(number) for number in data]
i = absolute.index(min(absolute))
return data[i]
It should return the element found in the index where the absolute value is minimum. But it does not work:
df.groupby(['Animal']).agg({'Max Speed': [nearzero]})
Is the function or groupby badly defined?
Upvotes: 0
Views: 222
Reputation: 862671
I think you need DataFrameGroupBy.idxmin
for indices by mins per groups, also convert column Max Speed
to abs
, last call loc
for select rows:
df = df.loc[df['Max Speed'].abs().groupby(df['Animal']).idxmin()]
print (df)
Animal Max Speed
1 Falcon 370.30
3 Parrot -12.55
Another solution with new column:
df['Max Speed Abs'] = df['Max Speed'].abs()
df = df.loc[df.groupby('Animal')['Max Speed Abs'].idxmin()]
EDIT: For groupby
by multiple Series
use:
df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
'Max Speed' : [380.1, 370.3, 24.77, -12.55],
'Dates':['2010-10-09'] * 4})
df = df.loc[df['Max Speed'].abs().groupby([df['Animal'], df['Dates']]).idxmin()]
print (df)
Animal Max Speed Dates
1 Falcon 370.30 2010-10-09
3 Parrot -12.55 2010-10-09
Upvotes: 1
Reputation: 30971
Define your function as:
def nearzero(data):
dat = data.tolist()
absolute = [abs(number) for number in dat]
return dat[absolute.index(min(absolute))]
Note that this function is called with a df column (Series) as the argument, but selection must be performed from the underlying list.
Then call:
df.groupby(['Animal'])['Max Speed'].apply(nearzero)
The second alternative, without explicit conversion to the underlying list:
Define the function as:
def nearzero2(data):
return data[data.abs().idxmin()]
Then call:
df.groupby(['Animal'])['Max Speed'].apply(nearzero2)
Or to get the result just as in your questrion:
df.groupby(['Animal']).agg({'Max Speed': nearzero2}).reset_index()
Upvotes: 1
Reputation: 38415
You can define a function in python,
def abs_min(x):
for elem in x:
if abs(elem) == min(abs(x)):
return elem
df.groupby('Animal')['Max Speed'].apply(abs_min)
Animal
Falcon 370.30
Parrot -12.55
Or use generator,
df.groupby('Animal')['Max Speed'].apply(lambda x: next(i for i in x if abs(i) == min(abs(x))))
Upvotes: 1