Reputation: 887
I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777.0 250.0 810.0
A 17-07-19 9 637.0 121.0 529.0
A 20-07-19 2 295.0 272.0 490.0
A 21-07-19 3 778.0 600.0 544.0
A 22-07-19 6 741.0 792.0 907.0
A 25-07-19 6 435.0 416.0 820.0
A 26-07-19 8 590.0 455.0 342.0
A 27-07-19 6 763.0 476.0 753.0
A 02-08-19 6 717.0 211.0 454.0
A 03-08-19 6 152.0 442.0 475.0
A 05-08-19 6 564.0 340.0 302.0
A 07-08-19 6 105.0 929.0 633.0
A 08-08-19 6 948.0 366.0 586.0
B 07-08-19 4 509.0 690.0 406.0
B 08-08-19 2 413.0 725.0 414.0
B 12-08-19 2 170.0 702.0 912.0
B 13-08-19 3 851.0 616.0 477.0
B 14-08-19 9 475.0 447.0 555.0
B 15-08-19 1 412.0 403.0 708.0
B 17-08-19 2 299.0 537.0 321.0
B 18-08-19 4 310.0 119.0 125.0
C 16-07-19 3 777.0 250.0 810.0
C 17-07-19 9 637.0 121.0 529.0
C 20-07-19 2 NaN NaN NaN
C 21-07-19 3 NaN NaN NaN
C 22-07-19 6 741.0 792.0 907.0
C 25-07-19 6 NaN NaN NaN
C 26-07-19 8 590.0 455.0 342.0
C 27-07-19 6 763.0 476.0 753.0
C 02-08-19 6 717.0 211.0 454.0
C 03-08-19 6 NaN NaN NaN
C 05-08-19 6 564.0 340.0 302.0
C 07-08-19 6 NaN NaN NaN
C 08-08-19 6 948.0 366.0 586.0
I want to show the mean
value of n
last days (using Date
column), excluding the value of current day.
I'm using this code (what should I do to fix this?):
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
n = 4
cols = df.filter(regex='Var').columns
df = df.set_index('Date')
df_ = df.set_index('ID', append=True).swaplevel(1,0)
df1 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].count()
df2 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].mean()
df3 = (df1.mul(df2)
.sub(df_[cols])
.div(df1[cols].sub(1)).add_suffix(f'_{n}')
)
df4 = df_.join(df3)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
A 16-07-19 3 777.0 250.0 810.0 NaN NaN NaN
A 17-07-19 9 637.0 121.0 529.0 777.000000 250.000000 810.0
A 20-07-19 2 295.0 272.0 490.0 707.000000 185.500000 669.5
A 21-07-19 3 778.0 600.0 544.0 466.000000 196.500000 509.5
A 22-07-19 6 741.0 792.0 907.0 536.500000 436.000000 517.0
A 25-07-19 6 435.0 416.0 820.0 759.500000 696.000000 725.5
A 26-07-19 8 590.0 455.0 342.0 588.000000 604.000000 863.5
A 27-07-19 6 763.0 476.0 753.0 512.500000 435.500000 581.0
A 02-08-19 6 717.0 211.0 454.0 NaN NaN NaN
A 03-08-19 6 152.0 442.0 475.0 717.000000 211.000000 454.0
A 05-08-19 6 564.0 340.0 302.0 434.500000 326.500000 464.5
A 07-08-19 6 105.0 929.0 633.0 358.000000 391.000000 388.5
A 08-08-19 6 948.0 366.0 586.0 334.500000 634.500000 467.5
B 07-08-19 4 509.0 690.0 406.0 NaN NaN NaN
B 08-08-19 2 413.0 725.0 414.0 509.000000 690.000000 406.0
B 12-08-19 2 170.0 702.0 912.0 413.000000 725.000000 414.0
B 13-08-19 3 851.0 616.0 477.0 291.500000 713.500000 663.0
B 14-08-19 9 475.0 447.0 555.0 510.500000 659.000000 694.5
B 15-08-19 1 412.0 403.0 708.0 498.666667 588.333333 648.0
B 17-08-19 2 299.0 537.0 321.0 579.333333 488.666667 580.0
B 18-08-19 4 310.0 119.0 125.0 395.333333 462.333333 528.0
C 16-07-19 3 777.0 250.0 810.0 NaN NaN NaN
C 17-07-19 9 637.0 121.0 529.0 777.000000 250.000000 810.0
C 20-07-19 2 NaN NaN NaN 707.000000 185.500000 669.5
C 21-07-19 3 NaN NaN NaN 637.000000 121.000000 529.0
C 22-07-19 6 741.0 792.0 907.0 NaN NaN NaN
C 25-07-19 6 NaN NaN NaN 741.000000 792.000000 907.0
C 26-07-19 8 590.0 455.0 342.0 741.000000 792.000000 907.0
C 27-07-19 6 763.0 476.0 753.0 590.000000 455.000000 342.0
C 02-08-19 6 717.0 211.0 454.0 NaN NaN NaN
C 03-08-19 6 NaN NaN NaN 717.000000 211.000000 454.0
C 05-08-19 6 564.0 340.0 302.0 717.000000 211.000000 454.0
C 07-08-19 6 NaN NaN NaN 564.000000 340.000000 302.0
C 08-08-19 6 948.0 366.0 586.0 564.000000 340.000000 302.0
Numbers after the decimal point is not the matter.
These threads might help:
Taking the mean value of N last days
Taking the min value of N last days
Upvotes: 0
Views: 29
Reputation: 4215
Try:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = (df.groupby('ID')['Date','123_Var','456_Var','789_Var'].rolling('4D', on='Date', closed='left').mean())
dfx = (df.set_index(['ID','Date'])
.join(df1.reset_index().set_index(['ID','Date']), rsuffix='_4')
.reset_index()
.drop('level_1',axis=1))
print(dfx.to_string())
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
0 A 2019-07-16 3 777.0 250.0 810.0 NaN NaN NaN
1 A 2019-07-17 9 637.0 121.0 529.0 777.000000 250.000000 810.0
2 A 2019-07-20 2 295.0 272.0 490.0 707.000000 185.500000 669.5
3 A 2019-07-21 3 778.0 600.0 544.0 466.000000 196.500000 509.5
4 A 2019-07-22 6 741.0 792.0 907.0 536.500000 436.000000 517.0
5 A 2019-07-25 6 435.0 416.0 820.0 759.500000 696.000000 725.5
6 A 2019-07-26 8 590.0 455.0 342.0 588.000000 604.000000 863.5
7 A 2019-07-27 6 763.0 476.0 753.0 512.500000 435.500000 581.0
8 A 2019-08-02 6 717.0 211.0 454.0 NaN NaN NaN
9 A 2019-08-03 6 152.0 442.0 475.0 717.000000 211.000000 454.0
10 A 2019-08-05 6 564.0 340.0 302.0 434.500000 326.500000 464.5
11 A 2019-08-07 6 105.0 929.0 633.0 358.000000 391.000000 388.5
12 A 2019-08-08 6 948.0 366.0 586.0 334.500000 634.500000 467.5
13 B 2019-08-07 4 509.0 690.0 406.0 NaN NaN NaN
14 B 2019-08-08 2 413.0 725.0 414.0 509.000000 690.000000 406.0
15 B 2019-08-12 2 170.0 702.0 912.0 413.000000 725.000000 414.0
16 B 2019-08-13 3 851.0 616.0 477.0 170.000000 702.000000 912.0
17 B 2019-08-14 9 475.0 447.0 555.0 510.500000 659.000000 694.5
18 B 2019-08-15 1 412.0 403.0 708.0 498.666667 588.333333 648.0
19 B 2019-08-17 2 299.0 537.0 321.0 579.333333 488.666667 580.0
20 B 2019-08-18 4 310.0 119.0 125.0 395.333333 462.333333 528.0
21 C 2019-07-16 3 777.0 250.0 810.0 NaN NaN NaN
22 C 2019-07-17 9 637.0 121.0 529.0 777.000000 250.000000 810.0
23 C 2019-07-20 2 NaN NaN NaN 707.000000 185.500000 669.5
24 C 2019-07-21 3 NaN NaN NaN 637.000000 121.000000 529.0
25 C 2019-07-22 6 741.0 792.0 907.0 NaN NaN NaN
26 C 2019-07-25 6 NaN NaN NaN 741.000000 792.000000 907.0
27 C 2019-07-26 8 590.0 455.0 342.0 741.000000 792.000000 907.0
28 C 2019-07-27 6 763.0 476.0 753.0 590.000000 455.000000 342.0
29 C 2019-08-02 6 717.0 211.0 454.0 NaN NaN NaN
30 C 2019-08-03 6 NaN NaN NaN 717.000000 211.000000 454.0
31 C 2019-08-05 6 564.0 340.0 302.0 717.000000 211.000000 454.0
32 C 2019-08-07 6 NaN NaN NaN 564.000000 340.000000 302.0
33 C 2019-08-08 6 948.0 366.0 586.0 564.000000 340.000000 302.0
Upvotes: 1