how to merge a groupby output (series) back to the dataframe

Question

I've a data frame (portfolios) like this

Date    ReturnonEquity
31-Jan-10   0.67
31-Jan-10   1.00
31-Jan-10   0.49
31-Jan-10   0.15
28-Feb-10   0.80
28-Feb-10   0.08
28-Feb-10   0.54
28-Feb-10   0.77

I calculated the winsorized Roe for every date -

portfolios.groupby(by='Date')['ROE'].apply(lambda x: zscore(mstats.winsorize(x[~np.isnan(x)],0.03)))

which outputs a series like this (ignore values)

2010-01-31   [0.67, 1.00, 0.49, 0.15]
2010-02-28    [0.80, 0.08, 0.54, 0.77]

how do i put the z-scores back in the portfolios as a new column (join creates the columns with the array, not like a continuous values)

Date ReturnonEquity RoE_winsorized

31-Jan-10   0.67              0.67
31-Jan-10   1.00              1.00
31-Jan-10   0.49              0.49
31-Jan-10   0.15              0.15
28-Feb-10   0.80              0.80
28-Feb-10   0.08              0.08
28-Feb-10   0.54              0.54
28-Feb-10   0.77              0.77

Thanks in advance!

unutbu · Accepted Answer

Use transform (instead of apply), as chrisb suggests, and assign the result to portfolios[columname]:

In [42]: portfolios['windsorized'] = portfolios.groupby(by='Date')['ROE'].transform(lambda x: (stats.zscore(mstats.winsorize(x[~np.isnan(x)],0.03))))

In [43]: portfolios
Out[43]: 
        Date   ROE  windsorized
0 2010-01-31  0.67     0.301112
1 2010-01-31  1.00     1.375348
2 2010-01-31  0.49    -0.284835
3 2010-01-31  0.15    -1.391624
4 2010-02-28  0.80     0.876605
5 2010-02-28  0.08    -1.623022
6 2010-02-28  0.54    -0.026038
7 2010-02-28  0.77     0.772454

how to merge a groupby output (series) back to the dataframe

Answers (1)

Related Questions