Reputation: 215
I was trying to use roll
to find mean of previous 6 days value. The following code is not ignoring NaN
.
import pandas as pd
import numpy as np
import datetime
xx =pd.DataFrame(list(zip([datetime.datetime.fromtimestamp(x*60*60*24*2) for x in range(0,16,2)],[2,1,3,np.nan, 4,5,6,7])), columns=["datetime", "val"])
xx.set_index("datetime", inplace=True)
xx.rolling(str(6)+'d',1).apply(lambda x : np.nanmean(x))
The above code gives:
val
datetime
1969-12-31 18:00:00 2.0
1970-01-04 18:00:00 1.5
1970-01-08 18:00:00 2.0
1970-01-12 18:00:00 NaN
1970-01-16 18:00:00 4.0
1970-01-20 18:00:00 4.5
1970-01-24 18:00:00 5.5
1970-01-28 18:00:00 6.5
However, if I remove datetime series index,
xx = pd.DataFrame([2,1,3,np.nan, 4,5,6,7],
columns=["val"])
yy = xx.rolling(3,1).apply(lambda x : np.nanmean(x))
the NaN
is ignored:
val
0 2.0
1 1.5
2 2.0
3 2.0
4 3.5
5 4.5
6 5.0
7 6.0
Much appreciation to any help!
This is a bug and was fixed here: https://github.com/pandas-dev/pandas/pull/17156
Upvotes: 0
Views: 1805
Reputation: 215
This is confirmed as a bug and was fixed here https://github.com/pandas-dev/pandas/pull/17156
Upvotes: 1
Reputation: 497
It would probably be better to interpolate your dataframe or you could also back or forward fill with fillna().
Try this code:
xx.interpolate(inplace=True)
yy = xx.rolling(str(6)+'d',1,).apply(lambda x : np.nanmean(x))
Tested and its working
Found Similar Question Here
Upvotes: 0