Reputation: 111
I have this data:
data = {
'id': [1, 2, 3, 4, 5, 6],
'number': [2, 3, 5, 6, 7, 8],
'date': ['2010-01-01', '2010-01-01', '2020-01-04', '2020-01-04', '2020-01-04', '2020-01-05']
}
df = pd.DataFrame(data)
I need to get the mean of col number in the last 1 day.
df.index = pd.to_datetime(df['date'])
df['mean_number'] = df['number'].rolling('1D').mean().shift()
Ps: I use .shift() for the mean not to include the current line
Result in this:
id number date mean_number
date
2010-01-01 1 2 2010-01-01 NaN
2010-01-01 2 3 2010-01-01 2.0
2020-01-04 3 5 2020-01-04 2.5
2020-01-04 4 6 2020-01-04 5.0
2020-01-04 5 7 2020-01-04 5.5
2020-01-05 6 8 2020-01-05 6.0
Id 1 is right, because there is no data before.
Id 2 is right, because is doing the mean only of the id 1.
Id 3 is wrong, because I only set 1D in the rolling window, so it was only supposed to be included 2020-01-03 and 2020-01-04.
Id 4 is right, because is doing the mean only of the id 3.
Id 5 is right, because is doing the mean only of the id 3 and id 4 (the 2 are in the range of 1D).
Id 6 is right, because is doing the mean only of the id 3, id 4 and id 5 (the 3 are in the range of 1D).
What am i doing wrong and can i fix it?
Upvotes: 0
Views: 53
Reputation: 91
Try:
df['mean_number'] = df['number'].rolling('1D', closed='left').mean()
Result:
id number date mean_number
date
2010-01-01 1 2 2010-01-01 NaN
2010-01-01 2 3 2010-01-01 2.0
2020-01-04 3 5 2020-01-04 NaN
2020-01-04 4 6 2020-01-04 5.0
2020-01-04 5 7 2020-01-04 5.5
2020-01-05 6 8 2020-01-05 6.0
🤔 humm~ not 100% sure what u are trying to do. But u can try to set the 'closed' parameter instead of using .shift()
For more detail, u can check this out: Windowing operation
Upvotes: 1