asd
asd

Reputation: 1309

Summary stats for wide data

Is there an efficient way to to calculate summary stats for each fruit where that fruit row is True?

df   comment  type      score    apple   banana   pear   
0     dfsd    new        0.4     True    False    True     
1     sdfs    low        0.3     False   True     False 
2     sdddfs   low       0.2     False   True     False    
3     sdsfs    low       0.8     True    True     False    
4     ddds    low        0.1     True    True     True

... 

I've tried:

fruits = ['apple','banana','pear']

for fruit in fruits:
    df1 = df.loc[df.f'{fruit}', :]
    df1.describe()

Expected Output:

fruit
        count     mean_score   std_score  
apple               
banana              
pear                

Upvotes: 3

Views: 144

Answers (2)

jeremy_rutman
jeremy_rutman

Reputation: 5788

A general way to do this that doesnt rely on df.describe happening to have the stats you are after is:

df2 = df.groupby(['apple','banana','pear']).agg({'type':['count','mean','std']})

will get you the count, mean, and std for each fruit.\

In response to a comment, you can rework the indices to get just a fruit per row by:

df2.index = [df2.index.names[i] for j in range(len(df2.index.names)) for i,x in enumerate(df2.index[j]) if x ]

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71707

Select the required fruits columns then get the corresponding score for each fruit column and mask the False value finally use describe to get the descriptive statistics:

s = ['count', 'mean', 'std']
stats = df[fruits].apply(lambda m: df['score'].mask(~m)).describe().T[s]

print(stats)

        count      mean       std
apple     3.0  0.433333  0.351188
banana    4.0  0.350000  0.310913
pear      2.0  0.250000  0.212132

Upvotes: 3

Related Questions