bikuser
bikuser

Reputation: 2103

calculate mean only when the number of values in each rows is higher then certain number in python pandas

I have a daily time series dataframe with nine columns. Each columns represent the measurement from different methods. I want to calculate daily mean only when there are more than two measurements otherwise want to assign as NaN. How to do that with pandas dataframe?

suppose my df looks like:

             0    1      2    3     4     5    6      7     8
2000-02-25  NaN   0.22  0.54  NaN   NaN  NaN   NaN   NaN    NaN
2000-02-26  0.57  NaN   0.91  0.21  NaN  0.22  NaN   0.51   NaN
2000-02-27  0.10  0.14  0.09  NaN   0.17 NaN   0.05  NaN    NaN
2000-02-28  NaN   NaN   NaN  NaN    NaN  NaN   NaN   NaN    0.14
2000-02-29  0.82  NaN   0.75  NaN   NaN  NaN   0.14  NaN    NaN

and I'm expecting mean values like:

             0    
2000-02-25  NaN   
2000-02-26  0.48  
2000-02-27  0.11  
2000-02-28  NaN   
2000-02-29  0.57  

Upvotes: 2

Views: 40

Answers (1)

jezrael
jezrael

Reputation: 863301

Use where for NaNs values by condition created by DataFrame.count for count with exclude NaNs and comparing by Series.gt (>):

s = df.where(df.count(axis=1).gt(2)).mean(axis=1)
#alternative soluton with changed order
#s = df.mean(axis=1).where(df.count(axis=1).gt(2))
print (s)
2000-02-25      NaN
2000-02-26    0.484
2000-02-27    0.110
2000-02-28      NaN
2000-02-29    0.570
dtype: float64

Upvotes: 6

Related Questions