Reputation: 1667
I would like to compare stats such as mean,std etc of my dataset conditional on a dummy variable. I saw a command for that somewhere but I cannot remember (or google search was not succesful). I would like to produce an output like this:
dummy mean(var1) mean(var2)
0 1.5 3
1 10 10
Maybe something with groupby??
This is a min example:
dict1 = [{'dummy': '0', 'var1': 1, 'var2': 2},
{'dummy': '0', 'var1': 2, 'var2': 4},
{'dummy': '1', 'var1': 5, 'var2': 8},
{'dummy': '1', 'var1': 15, 'var2': 12},]
df = pd.DataFrame(dict1, index=['s1', 's2','s3','s4'])
Upvotes: 1
Views: 73
Reputation: 59274
I believe you want groupby
+describe
ndf = df.groupby('dummy').describe()
Then just select whatever info you want
ndf.loc[:, ndf.columns.get_level_values(1)=='mean']
var1 var2
mean mean
dummy
0 1.5 3.0
1 10.0 10.0
describe
might be more powerful because will give you lots of different stats upfront. But of course you can select your methods, .e.g
df.groupby('dummy').mean()
var1 var2
dummy
0 1.5 3.0
1 10.0 10.0
df.groupby('dummy').std()
var1 var2
dummy
0 0.707107 1.414214
1 7.071068 2.828427
Upvotes: 2