Z_D
Z_D

Reputation: 817

Pandas Groupby: Aggregate and Conditional

I am grouping by item-date pairs in a PD dataframe and would like to add some custom conditional functions using lambda to a larger aggregation function.

Using the tip here, I can do the following, which works properly and counts positive and negative values in the given column.

item_day_count=item_day_group['PriceDiff_pct'].agg({'Pos':lambda val: (val > 0).sum(),'Neg':lambda val: (val <= 0).sum()}).reset_index()

And I can also do a different aggregate that contains both pre-built aggregations and a custom percentile function that returns proper stats:

item_day_count_v2=item_day_group['PriceDiff_pct'].agg(['count','min',percentile(25),'mean','median',percentile(75),'max']).reset_index()

But I cannot figure out how to combine these into one larger function - I get the error : AttributeError: 'DataFrameGroupBy' object has no attribute 'name' when I try the below:

item_day_count_v3=item_day_group['PriceDiff_pct'].agg(['count',{'Pos_Return':lambda val: (val > 0).sum(),'Neg_Return':lambda val: (val <= 0).sum()},'min',percentile(25),'mean','median',percentile(75),'max']).reset_index() 

Does anyone know how to combine these functions? Seems like it I am close considering both work separately. Thanks for your help!

Upvotes: 4

Views: 11897

Answers (3)

iDrwish
iDrwish

Reputation: 3103

I don't advise combining a defined func in a dict and native aggregators like that. You can pass them as a list of tuples with function name and function as follows:

item_day_count_v3 = item_day_group['PriceDiff_pct'].agg([
    ('Count', 'count'), 
    ('Pos_Return', lambda val: (val > 0).sum()), 
    ('Neg_Return', lambda val: (val < 0).sum()), 
    ('Mean', 'mean'), 
    ('Median', 'median'), 
    ('25%Percntile', percentile(25)), 
    ('75%Percntile', percentile(75)), 
    ('Max', 'max')
]).reset_index()

Function name will be the column name.

Upvotes: 2

floydn
floydn

Reputation: 1131

As other have said, you cannot mix named functions with a dict in the agg() method.

Here's a practical way to do want you want. Let's make up some data.

df = pd.DataFrame({'A':['x', 'y']*3,
                   'B':[10,20,30,40,50,60]})

df
Out[38]: 
   A   B
0  x  10
1  y  20
2  x  30
3  y  40
4  x  50
5  y  60

Define a function to count values greater than or equal to 30.

def ge30(x):
    return (x>=30).sum()

Now use your custom func in the groupby().agg().

df.groupby('A').agg(['sum', 'mean', ge30])
Out[40]: 
     B          
   sum mean ge30
A               
x   90   30    2
y  120   40    2

Upvotes: 0

Victor Valente
Victor Valente

Reputation: 811

From pandas docs on the aggregate() method:

Accepted Combinations are:

  • string function name

  • function

  • list of functions

  • dict of column names -> functions (or list of functions)

I would say it doesn't support all combinations, though.

So, you can try this:

Get everything in a dict first, then agg using that dict.

# The functions to agg on every column.
agg_dict = dict((c, ['count','min',percentile(25),'mean','median',percentile(75),'max']) for c in item_day.columns.values)

# Append to the dict the column-specific functions.
agg_dict['Pos_Return'] = lambda val: (val > 0).sum()
agg_dict['Neg_Return'] = lambda val: (val <= 0).sum()

# Agg using the dict.
item_day_group['PriceDiff_pct'].agg(agg_dict)

Upvotes: 0

Related Questions