stefanodv
stefanodv

Reputation: 533

groupby on a pandas data frame with a customized aggregation function

Suppose we have the data frame:

df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
                   'Max Speed' : [380.1, 370.3, 24.77, -12.55]})    

I have to construct a function similar to the minimum in absolute value, it must return the element closer to zero. By grouping by 'Animal', in our case, it should return:

   Animal  Max Speed
0  Falcon     370.30
1  Parrot     -12.55

I tried a function like this:

def nearzero():
   absolute = [abs(number) for number in data]
   i = absolute.index(min(absolute))
   return data[i]

It should return the element found in the index where the absolute value is minimum. But it does not work:

df.groupby(['Animal']).agg({'Max Speed': [nearzero]})

Is the function or groupby badly defined?

Upvotes: 0

Views: 222

Answers (3)

jezrael
jezrael

Reputation: 862671

I think you need DataFrameGroupBy.idxmin for indices by mins per groups, also convert column Max Speed to abs, last call loc for select rows:

df = df.loc[df['Max Speed'].abs().groupby(df['Animal']).idxmin()]
print (df)
   Animal  Max Speed
1  Falcon     370.30
3  Parrot     -12.55

Another solution with new column:

df['Max Speed Abs'] = df['Max Speed'].abs()
df = df.loc[df.groupby('Animal')['Max Speed Abs'].idxmin()]

EDIT: For groupby by multiple Series use:

df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon','Parrot', 'Parrot'],
                   'Max Speed' : [380.1, 370.3, 24.77, -12.55],
                   'Dates':['2010-10-09'] * 4})  

df = df.loc[df['Max Speed'].abs().groupby([df['Animal'], df['Dates']]).idxmin()]
print (df)
   Animal  Max Speed       Dates
1  Falcon     370.30  2010-10-09
3  Parrot     -12.55  2010-10-09

Upvotes: 1

Valdi_Bo
Valdi_Bo

Reputation: 30971

Define your function as:

def nearzero(data):
    dat = data.tolist()
    absolute = [abs(number) for number in dat]
    return dat[absolute.index(min(absolute))]

Note that this function is called with a df column (Series) as the argument, but selection must be performed from the underlying list.

Then call:

df.groupby(['Animal'])['Max Speed'].apply(nearzero)

The second alternative, without explicit conversion to the underlying list:

Define the function as:

def nearzero2(data):
    return data[data.abs().idxmin()]

Then call:

df.groupby(['Animal'])['Max Speed'].apply(nearzero2)

Or to get the result just as in your questrion:

df.groupby(['Animal']).agg({'Max Speed': nearzero2}).reset_index()

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

You can define a function in python,

def abs_min(x):
    for elem in x:
        if abs(elem) == min(abs(x)):
            return elem

df.groupby('Animal')['Max Speed'].apply(abs_min)

Animal
Falcon    370.30
Parrot    -12.55

Or use generator,

df.groupby('Animal')['Max Speed'].apply(lambda x: next(i for i in x if abs(i) == min(abs(x))))

Upvotes: 1

Related Questions