ceeka9388
ceeka9388

Reputation: 69

Apply Custom Function to Pandas Groupby using Rolling 12 Month Window

I have the following Dataframe, here is a sample:

Area    Num Month   Year    Type    Result
AA      1   Jan     2015    A       1
AA      1   Feb     2015    A       2
AA      1   Mar     2015    A       3
AA      1   Apr     2015    A       4
AA      1   May     2015    A       5
AA      1   Jun     2015    A       6
AA      1   Jul     2015    A       7
AA      1   Aug     2015    A       8
AA      1   Sep     2015    A       9
AA      1   Oct     2015    A       10
AA      1   Nov     2015    A       11
AA      1   Dec     2015    A       12
BB      2   Jan     2015    B       1
BB      2   Feb     2015    B       2
BB      2   Mar     2015    B       3
BB      2   Apr     2015    B       4
BB      2   May     2015    B       5
BB      2   Jun     2015    B       6
BB      2   Jul     2015    B       7
BB      2   Aug     2015    B       8
BB      2   Sep     2015    B       9
BB      2   Oct     2015    B       10
BB      2   Nov     2015    B       11
BB      2   Dec     2015    B       12

I need to groupby by Columns: Area, Num, Type, Year and calculate 90th percentile for each of these groups. I have multiple Num and Type for each Area.

My expected output is:

I need to calculate 12 month percentile value for previous twelve months for each of this group and paste the output in next month.

For example: Calculate the 90th percentile values for the group with {Area:AA,Num:1,Year:2015,Type:A} and assign it to {Area:AA,Num:1,Year:2016,Type:A,Month:Jan}.

Area    Num Month   Year    Type    Result
AA      1   Jan     2016    A       10.9
AA      1   Feb     2016    A       10.99

Now for new group(Month February) {Area:AA,Num:1,Year:2016,Type:A,Month:Feb} should calculate the value of 90th Percentile of Result columnn from Feb 2015 to Jan 2016 and so on.

I am finding solution to calculate percentile and groupby individually, but I am stuck as to how to calculate percentile on this moving window.

Upvotes: 2

Views: 2081

Answers (1)

Scott Boston
Scott Boston

Reputation: 153510

df2 = df.assign(date=pd.to_datetime(df.Year.astype(str).add("-").add(df.Month.astype(str)).add("-").add(str(1))))
df2 = df2.set_index('date')
df2.groupby(['Area','Num','Type'])['Result'].rolling(12,min_periods=3).quantile(.9).reset_index()

Output:

   Area  Num Type       date  Result
0    AA    1    A 2015-01-01     NaN
1    AA    1    A 2015-02-01     NaN
2    AA    1    A 2015-03-01     2.0
3    AA    1    A 2015-04-01     3.0
4    AA    1    A 2015-05-01     4.0
5    AA    1    A 2015-06-01     5.0
6    AA    1    A 2015-07-01     6.0
7    AA    1    A 2015-08-01     7.0
8    AA    1    A 2015-09-01     8.0
9    AA    1    A 2015-10-01     9.0
10   AA    1    A 2015-11-01    10.0
11   AA    1    A 2015-12-01    10.0
12   BB    2    B 2015-01-01     NaN
13   BB    2    B 2015-02-01     NaN
14   BB    2    B 2015-03-01     2.0
15   BB    2    B 2015-04-01     3.0
16   BB    2    B 2015-05-01     4.0
17   BB    2    B 2015-06-01     5.0
18   BB    2    B 2015-07-01     6.0
19   BB    2    B 2015-08-01     7.0
20   BB    2    B 2015-09-01     8.0
21   BB    2    B 2015-10-01     9.0
22   BB    2    B 2015-11-01    10.0
23   BB    2    B 2015-12-01    10.0

Upvotes: 1

Related Questions