Reputation: 21
My date-indexed data can have multiple observations for a given date. I want to get the rolling median of a value but am not getting the result that I am looking for:
df = pd.DataFrame({
'date': ['2020-06-22', '2020-06-23','2020-06-24','2020-06-24', '2020-06-25', '2020-06-26'],
'value': [2,8,5,1,3,7]
})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
# Attempt to get the 3-day rolling median of 'value':
df['value'].rolling('3D').median()
# This yields the following, i.e. one median value
# per **observation**
# (two values for 6/24 in this example):
date
2020-06-22 2.0
2020-06-23 5.0
2020-06-24 5.0
2020-06-24 3.5
2020-06-25 4.0
2020-06-26 4.0
Name: value, dtype: float64
# I was hoping to get one median value
# per **distinct date** in the index
# The median for 6/24, for example, would be computed
# from **all** observations on 6/22, 6/23 and 6/24(2 observations)
date
2020-06-22 NaN
2020-06-23 NaN
2020-06-24 3.5
2020-06-25 4.0
2020-06-26 4.0
Name: value, dtype: float64
How do I need to change my code?
Upvotes: 1
Views: 492
Reputation: 8219
As far as I can tell, your code produces the right answer for the second occurrence of 2020-06-24
, as 3.5 is the median of 4 numbers 2,8,5,1. The first occurrence of 2020-06-24
only uses its own value and the ones from the two prior days. Presumably, and I am speculating here, it is looking at the '3D' window in the elements in the rows preceding it in the timeseries, not following.
So I think your code only needs a small modification to satisfy your requirement and that is if there are multiple rows with the same date we should just pick the last one. We will do this below with groupby
. Also you want the first two values to be NaN
rather than medians of shorter time series -- this can be achieved by passing min_periods = 3
in the rolling
function. Here is all the code, I put the median into its own column
df['median'] = df['value'].rolling('3D', min_periods = 3).median()
df.groupby(level = 0, axis = 0).last()
prints
value median
date
2020-06-22 2 NaN
2020-06-23 8 NaN
2020-06-24 1 3.5
2020-06-25 3 4.0
2020-06-26 7 4.0
Upvotes: 1