Summary stats for wide data

Question

Is there an efficient way to to calculate summary stats for each fruit where that fruit row is True?

df   comment  type      score    apple   banana   pear   
0     dfsd    new        0.4     True    False    True     
1     sdfs    low        0.3     False   True     False 
2     sdddfs   low       0.2     False   True     False    
3     sdsfs    low       0.8     True    True     False    
4     ddds    low        0.1     True    True     True

...

I've tried:

fruits = ['apple','banana','pear']

for fruit in fruits:
    df1 = df.loc[df.f'{fruit}', :]
    df1.describe()

Expected Output:

fruit
        count     mean_score   std_score  
apple               
banana              
pear

Shubham Sharma · Accepted Answer

Select the required fruits columns then get the corresponding score for each fruit column and mask the False value finally use describe to get the descriptive statistics:

s = ['count', 'mean', 'std']
stats = df[fruits].apply(lambda m: df['score'].mask(~m)).describe().T[s]

print(stats)

        count      mean       std
apple     3.0  0.433333  0.351188
banana    4.0  0.350000  0.310913
pear      2.0  0.250000  0.212132

Summary stats for wide data

Answers (2)

Related Questions