ysearka
ysearka

Reputation: 3855

applying several functions in transform in pandas

After a groupby, when using agg, if a dict of columns:functions is passed, the functions will be applied in the corresponding columns. Nevertheless this syntax doesn't work with transform. Is there another way to apply several functions in transform?

Let's give an example:

import pandas as pd
df_test = pd.DataFrame([[1,2,3],[1,20,30],[2,30,50],[1,2,33],[2,4,50]],columns = ['a','b','c'])
Out[1]:
    a   b   c
0   1   2   3
1   1   20  30
2   2   30  50
3   1   2   33
4   2   4   50

def my_fct1(series):
    return series.mean()

def my_fct2(series):
    return series.std()

df_test.groupby('a').agg({'b':my_fct1,'c':my_fct2})

Out[2]:
    c   b
a       
1   16.522712   8
2   0.000000    17

The previous example shows how to apply different function to different columns in agg, but if we want to transform the columns without aggregating them, agg can't be used anymore. Therefore:

df_test.groupby('a').transform({'b':np.cumsum,'c':np.cumprod})
Out[3]:
TypeError: unhashable type: 'dict'

How can we perform such an action with the following expected output:

    a   b   c
0   1   2   3
1   1   22  90
2   2   30  50
3   1   24  2970
4   2   34  2500

Upvotes: 7

Views: 3335

Answers (3)

sammywemmy
sammywemmy

Reputation: 28699

With the updates to Pandas, you can use the assign method, along with transform to either append new columns, or replace existing columns with new values :

grouper = df_test.groupby("a")

df_test.assign(b=grouper["b"].transform("cumsum"), 
               c=grouper["c"].transform("cumprod"))

    a   b   c
0   1   2   3
1   1   22  90
2   2   30  50
3   1   24  2970
4   2   34  2500

Upvotes: 3

Allen Qin
Allen Qin

Reputation: 19947

You can still use a dict but with a bit of hack:

df_test.groupby('a').transform(lambda x: {'b': x.cumsum(), 'c': x.cumprod()}[x.name])
Out[427]: 
    b     c
0   2     3
1  22    90
2  30    50
3  24  2970
4  34  2500

If you need to keep column a, you can do:

df_test.set_index('a')\
       .groupby('a')\
       .transform(lambda x: {'b': x.cumsum(), 'c': x.cumprod()}[x.name])\
       .reset_index()
Out[429]: 
   a   b     c
0  1   2     3
1  1  22    90
2  2  30    50
3  1  24  2970
4  2  34  2500

Another way is to use an if else to check column names:

df_test.set_index('a')\
       .groupby('a')\
       .transform(lambda x: x.cumsum() if x.name=='b' else x.cumprod())\
       .reset_index()

Upvotes: 7

jezrael
jezrael

Reputation: 862781

I think now (pandas 0.20.2) function transform is not implemented with dict - columns names with functions like agg.

If functions return Series with same lenght:

df1 = df_test.set_index('a').groupby('a').agg({'b':np.cumsum,'c':np.cumprod}).reset_index()
print (df1)
   a     c   b
0  1     3   2
1  1    90  22
2  2    50  30
3  1  2970  24
4  2  2500  34

But if aggreagte different length need join:

df2 = df_test[['a']].join(df_test.groupby('a').agg({'b':my_fct1,'c':my_fct2}), on='a')
print (df2)
   a          c   b
0  1  16.522712   8
1  1  16.522712   8
2  2   0.000000  17
3  1  16.522712   8
4  2   0.000000  17

Upvotes: 5

Related Questions