Reputation: 7587
I have a DataFrame
, where all columns are in binary
form, i.e; taking either 0
or 1
as value as shown below.
import pandas as pd
df = pd.DataFrame({'qualified_exam':[1,1,0,0,1,1],'gender_M':[1,0,1,0,1,1],'employed':[1,0,0,1,1,1],'married':[0,1,0,1,1,0]})
print(df)
qualified_exam gender_M employed married
0 1 1 1 0
1 1 0 0 1
2 0 1 0 0
3 0 0 1 1
4 1 1 1 1
5 1 1 1 0
I want to create a DataFrame
where I want to measure sum/mean
of column qualified _exam
by grouping by
all the remaining 3 columns individually - gender_M
, employed
, married
.
Final DataFrame should look something like this -
sum_0 sum_1 mean_0 mean_1
gender_M 1 3 0.50 0.75
employed 1 3 0.50 0.75
married 2 2 0.66 0.66
I tried it by doing groupby()
and agg()
for each of the 3 columns individually, and then appending the result one by one. This is too cumbersome. I am sure there is a better way.
Upvotes: 1
Views: 33
Reputation: 862681
Use DataFrame.melt
with aggregate sum
with mean
, reshape by DataFrame.unstack
and last flatten MultiIndex in columns
:
df1 = (df.melt('qualified_exam')
.groupby(['variable', 'value'])['qualified_exam']
.agg(['sum','mean'])
.unstack()
)
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (df1)
sum_0 sum_1 mean_0 mean_1
variable
employed 1 3 0.500000 0.750000
gender_M 1 3 0.500000 0.750000
married 2 2 0.666667 0.666667
Or use DataFrame.pivot_table
:
df1 = (df.melt('qualified_exam')
.pivot_table(index='variable',
columns='value',
values='qualified_exam',
aggfunc=('sum','mean')))
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
print (df1)
mean_0 mean_1 sum_0 sum_1
variable
employed 0.500000 0.750000 1.0 3.0
gender_M 0.500000 0.750000 1.0 3.0
married 0.666667 0.666667 2.0 2.0
Upvotes: 2