Reputation: 1309
Is there an efficient way to to calculate summary stats for each fruit where that fruit row is True?
df comment type score apple banana pear
0 dfsd new 0.4 True False True
1 sdfs low 0.3 False True False
2 sdddfs low 0.2 False True False
3 sdsfs low 0.8 True True False
4 ddds low 0.1 True True True
...
I've tried:
fruits = ['apple','banana','pear']
for fruit in fruits:
df1 = df.loc[df.f'{fruit}', :]
df1.describe()
Expected Output:
fruit
count mean_score std_score
apple
banana
pear
Upvotes: 3
Views: 144
Reputation: 5788
A general way to do this that doesnt rely on df.describe happening to have the stats you are after is:
df2 = df.groupby(['apple','banana','pear']).agg({'type':['count','mean','std']})
will get you the count, mean, and std for each fruit.\
In response to a comment, you can rework the indices to get just a fruit per row by:
df2.index = [df2.index.names[i] for j in range(len(df2.index.names)) for i,x in enumerate(df2.index[j]) if x ]
Upvotes: 0
Reputation: 71707
Select the required fruits
columns then get the corresponding score
for each fruit column and mask the False
value finally use describe
to get the descriptive statistics:
s = ['count', 'mean', 'std']
stats = df[fruits].apply(lambda m: df['score'].mask(~m)).describe().T[s]
print(stats)
count mean std
apple 3.0 0.433333 0.351188
banana 4.0 0.350000 0.310913
pear 2.0 0.250000 0.212132
Upvotes: 3