Giuliano Reginatto
Giuliano Reginatto

Reputation: 111

Pandas rolling by date interval returning wrong result

I have this data:

data = {
  'id': [1, 2, 3, 4, 5, 6],
  'number': [2, 3, 5, 6, 7, 8],
  'date': ['2010-01-01', '2010-01-01', '2020-01-04', '2020-01-04', '2020-01-04', '2020-01-05']
}

df = pd.DataFrame(data)

I need to get the mean of col number in the last 1 day.

df.index = pd.to_datetime(df['date'])
df['mean_number'] = df['number'].rolling('1D').mean().shift()

Ps: I use .shift() for the mean not to include the current line

Result in this:

            id  number        date  mean_number
date
2010-01-01   1       2  2010-01-01          NaN
2010-01-01   2       3  2010-01-01          2.0
2020-01-04   3       5  2020-01-04          2.5
2020-01-04   4       6  2020-01-04          5.0
2020-01-04   5       7  2020-01-04          5.5
2020-01-05   6       8  2020-01-05          6.0

Id 1 is right, because there is no data before.

Id 2 is right, because is doing the mean only of the id 1.

Id 3 is wrong, because I only set 1D in the rolling window, so it was only supposed to be included 2020-01-03 and 2020-01-04.

Id 4 is right, because is doing the mean only of the id 3.

Id 5 is right, because is doing the mean only of the id 3 and id 4 (the 2 are in the range of 1D).

Id 6 is right, because is doing the mean only of the id 3, id 4 and id 5 (the 3 are in the range of 1D).

What am i doing wrong and can i fix it?

Upvotes: 0

Views: 53

Answers (1)

Ben. S.
Ben. S.

Reputation: 91

Try:

df['mean_number'] = df['number'].rolling('1D', closed='left').mean()

Result:

            id  number        date  mean_number
date                                           
2010-01-01   1       2  2010-01-01          NaN
2010-01-01   2       3  2010-01-01          2.0
2020-01-04   3       5  2020-01-04          NaN
2020-01-04   4       6  2020-01-04          5.0
2020-01-04   5       7  2020-01-04          5.5
2020-01-05   6       8  2020-01-05          6.0

🤔 humm~ not 100% sure what u are trying to do. But u can try to set the 'closed' parameter instead of using .shift()

For more detail, u can check this out: Windowing operation

Upvotes: 1

Related Questions