Muhammad Hassan
Muhammad Hassan

Reputation: 4229

pandas dataframe getting lag values from same month of previous year

I have a following Dataframe:

           Id         Year             Month  Amount
0         1845        2010.0            1.0  18480.0
1         1845        2010.0            2.0  19560.0
2         1845        2010.0            3.0  23160.0
3         1845        2010.0            4.0  26360.0
4         1845        2010.0            5.0  31560.0
5         1845        2010.0            6.0  36080.0
6         1845        2010.0            7.0  35440.0
7         1845        2010.0            8.0  32960.0
8         1845        2010.0            9.0  33560.0
9         1845        2010.0           10.0  24240.0
10        1845        2010.0           11.0  24080.0
11        1845        2010.0           12.0  19680.0
12        1845        2011.0            1.0  18400.0
13        1845        2011.0            2.0  17960.0
14        1845        2011.0            3.0  20200.0
15        1845        2011.0            4.0  25560.0
16        1845        2011.0            5.0  29240.0
17        1845        2011.0            6.0  29160.0
18        1845        2011.0            7.0  29480.0
19        1845        2011.0            8.0  31000.0
20        1845        2011.0            9.0  25920.0
21        1845        2011.0           10.0  22560.0
22        1845        2011.0           11.0  25120.0
23        1845        2011.0           12.0  19640.0
24        1845        2012.0            1.0  20800.0
25        1845        2012.0            2.0  18880.0
26        1845        2012.0            3.0  21960.0
27        1845        2012.0            4.0  26920.0
28        1845        2012.0            5.0  28280.0
29        1845        2012.0            6.0  29160.0
30        1845        2012.0            7.0  32520.0
31        1845        2012.0            8.0  29720.0
32        1845        2012.0            9.0  28880.0
33        1845        2012.0           10.0  23760.0
34        1845        2012.0           11.0  22040.0
35        1845        2012.0           12.0  21960.0

And for every id, I want to go to same month in previous year and get the mean of last 2 values of Amount column. For Example, for id=1845, year 2011 and month 6, I want mean of amount values of 05/2010 and 04/2010. How can I do this? My actual dataframe is quite large with a lot of ids and historical data of around 3 years for every id.

Upvotes: 2

Views: 750

Answers (1)

piRSquared
piRSquared

Reputation: 294278

  1. Pivot such that 'Year' is in the index.
  2. Follow up with a shift to lag the year.
  3. Then stack, shift and use rolling(2).mean()

d = df.pivot_table('Amount', 'Year', ['Id', 'Month'])
d.shift().stack().shift().rolling(2).mean().stack().rename('Amount').reset_index()

      Year  Month    Id   Amount
0   2011.0    3.0  1845  19020.0
1   2011.0    4.0  1845  21360.0
2   2011.0    5.0  1845  24760.0
3   2011.0    6.0  1845  28960.0
4   2011.0    7.0  1845  33820.0
5   2011.0    8.0  1845  35760.0
6   2011.0    9.0  1845  34200.0
7   2011.0   10.0  1845  33260.0
8   2011.0   11.0  1845  28900.0
9   2011.0   12.0  1845  24160.0
10  2012.0    1.0  1845  21880.0
11  2012.0    2.0  1845  19040.0
12  2012.0    3.0  1845  18180.0
13  2012.0    4.0  1845  19080.0
14  2012.0    5.0  1845  22880.0
15  2012.0    6.0  1845  27400.0
16  2012.0    7.0  1845  29200.0
17  2012.0    8.0  1845  29320.0
18  2012.0    9.0  1845  30240.0
19  2012.0   10.0  1845  28460.0
20  2012.0   11.0  1845  24240.0
21  2012.0   12.0  1845  23840.0

Upvotes: 3

Related Questions