Reputation: 912
I want to apply a conditional function to the data grouped every day: For each column that has more than half number of values equal to 0 each day, set all values of the column of that day to np.nan
date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,0,0
2016-01-02 10:00:00,0,0
2016-01-02 11:00:00,0,0
2016-01-02 12:00:00,0,0
2016-01-02 13:00:00,1,0
2016-01-02 14:00:00,0,0
2016-01-02 15:00:00,0,0
2016-01-02 16:00:00,0,0
2016-01-02 17:00:00,0,0
2016-01-02 18:00:00,0,0
2016-01-02 19:00:00,0,0
2016-01-02 20:00:00,1,0
2016-01-02 21:00:00,0,0
2016-01-02 22:00:00,0,0
2016-01-02 23:00:00,0,0
Desired output:
date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,null,null
2016-01-02 10:00:00,null,null
2016-01-02 11:00:00,null,null
2016-01-02 12:00:00,null,null
2016-01-02 13:00:00,null,null
2016-01-02 14:00:00,null,null
2016-01-02 15:00:00,null,null
2016-01-02 16:00:00,null,null
2016-01-02 17:00:00,null,null
2016-01-02 18:00:00,null,null
2016-01-02 19:00:00,null,null
2016-01-02 20:00:00,null,null
2016-01-02 21:00:00,null,null
2016-01-02 22:00:00,null,null
2016-01-02 23:00:00,null,null
I have read this question: pandas apply function to data grouped by day and tried to follow:
df_mode = df.groupby(df.index.date).apply(lambda x: mode(x)[0])
I got the most frequent value for each day in each columns. However I don't know how to process the next step (set all value in the column for that day into np.nan
)
And is there any more efficient way than using apply
in this case?
Thank you
Upvotes: 2
Views: 52
Reputation: 863156
Use GroupBy.transform
with compare values by 0
and mean
for percentages and then set minssing values by DataFrame.mask
:
df = df.mask(df.eq(0).groupby(df.index.date).transform('mean').gt(.5))
print (df)
value1 value2
date
2016-01-01 09:00:00 14.0 14.0
2016-01-01 10:00:00 12.0 13.0
2016-01-01 11:00:00 11.0 13.0
2016-01-01 12:00:00 11.0 9.0
2016-01-01 13:00:00 17.0 21.0
2016-01-01 14:00:00 9.0 22.0
2016-01-01 15:00:00 10.0 9.0
2016-01-01 16:00:00 11.0 9.0
2016-01-01 17:00:00 8.0 8.0
2016-01-01 18:00:00 4.0 2.0
2016-01-01 19:00:00 5.0 7.0
2016-01-01 20:00:00 5.0 5.0
2016-01-01 21:00:00 3.0 4.0
2016-01-01 22:00:00 2.0 4.0
2016-01-01 23:00:00 2.0 4.0
2016-01-02 09:00:00 NaN NaN
2016-01-02 10:00:00 NaN NaN
2016-01-02 11:00:00 NaN NaN
2016-01-02 12:00:00 NaN NaN
2016-01-02 13:00:00 NaN NaN
2016-01-02 14:00:00 NaN NaN
2016-01-02 15:00:00 NaN NaN
2016-01-02 16:00:00 NaN NaN
2016-01-02 17:00:00 NaN NaN
2016-01-02 18:00:00 NaN NaN
2016-01-02 19:00:00 NaN NaN
2016-01-02 20:00:00 NaN NaN
2016-01-02 21:00:00 NaN NaN
2016-01-02 22:00:00 NaN NaN
2016-01-02 23:00:00 NaN NaN
Upvotes: 4