Reputation: 4253
I have a dataframe that looks like
dftest=pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=
['w','v1','v2','v3'])
df['x']=np.random.choice(a=[False, True], size=(1, 10), p=[0.5, 0.5])[0]
I would like to get a dataframe equal to
df.groupby('x').describe()
except that I would like to have the weighted mean
df.groupby(['x']).apply(lambda x: np.average(x['v1'], weights=x['w'], axis=0))
and as an additional column 'std'/('count'-1)
When I try
df.groupby(['x']).apply(lambda x: np.average(x[['v1','v2','v3']], weights=x['w'], axis=0))
I get a dataframe with 1 column containing a list of the 3 values instead of 3 columns.
How can get this all neatly into a regular dataframe?
Upvotes: 0
Views: 277
Reputation: 863166
Use pd.Series
for DataFrame
, if need add to describe
first add new level of MultiIndex
and then join
:
df1 = df.groupby('x').describe()
w = df.groupby(['x']).apply(lambda x: pd.Series(np.average(x[['v1','v2','v3']],
weights=x['w'], axis=0), index=['v1','v2','v3']))
w.columns = [w.columns, ['w_mean'] * len(w.columns)]
print (w)
v1 v2 v3
w_mean w_mean w_mean
x
False 4.047619 2.142857 4.714286
True 4.750000 3.937500 3.250000
df1 = df1.join(w).sort_index(axis=1)
print (df1)
v1 v2 \
25% 50% 75% count max mean min std w_mean 25%
x
False 2.25 3.5 6.25 6.0 9.0 4.333333 1.0 3.076795 4.047619 2.00
True 1.75 4.5 7.50 4.0 9.0 4.750000 1.0 3.862210 4.750000 2.75
v3 w \
std w_mean 25% 50% 75% count max mean min
x ...
False ... 3.271085 4.714286 6.50 8.0 8.75 6.0 9.0 7.0 2.0
True ... 3.109126 3.250000 0.75 3.5 6.75 4.0 9.0 4.0 0.0
std
x
False 2.683282
True 4.242641
[2 rows x 35 columns]
Upvotes: 1