S.Gu
S.Gu

Reputation: 649

Pandas Moving Average on Date

I have a df presented below:

enter image description here

Added code for dataframe:

df = pd.DataFrame({'Name':['John']*12
                  ,'Type':[*'AABBBBAAABAB']
                  ,'Period':['Noon','Morning']*6
                  ,'Date':['1/1/2020']*2+['1/4/2020']*2+
                          ['1/5/2020']*2+['1/2/2020']*2+
                          ['1/3/2020']*2+['1/19/2020', '1/31/2020']
                  ,'value':[1,2,7,8,9,10,3,4,5,6,5,6]}, index=range(1,13))

Basically I need 2 consecutive days moving average on a person's (John as an example) value, classified by type and period (person too, but not in this table).

Expected result as shown last column. eg.1. row 4, John's B type in Morning, average of value on 1/4/2020 and 1/3/2020 (row10) is (6+8)/2=7, and no data prior to 1/3/2020 of John's type B, so row10 result is na

eg.2. row 11 and 12, they are not consecutive date, so both are na

How do I use pandas to realise this function? should I use transform or rolling?

Upvotes: 1

Views: 230

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

You can use the following:

df.set_index('Date')\
  .groupby(['Name','Type','Period'])['value']\
  .rolling('2D', min_periods=2)\
  .mean()\
  .reset_index()

Output:

    Name Type   Period       Date  value
0   John    A  Morning 2020-01-01    NaN
1   John    A  Morning 2020-01-02    3.0
2   John    A     Noon 2020-01-01    NaN
3   John    A     Noon 2020-01-02    2.0
4   John    A     Noon 2020-01-03    4.0
5   John    A     Noon 2020-01-19    NaN
6   John    B  Morning 2020-01-04    NaN
7   John    B  Morning 2020-01-05    9.0
8   John    B  Morning 2020-01-03    8.0
9   John    B  Morning 2020-01-31    NaN
10  John    B     Noon 2020-01-04    NaN
11  John    B     Noon 2020-01-05    8.0

And, one way to merge it back to your original dataframe:

df_er = (df.set_index('Date')
           .groupby(['Name','Type','Period'])['value']
           .rolling('2D', min_periods=2)
           .mean()
           .rename('Expected Result'))
df.merge(df_er, left_on=['Name', 'Type', 'Period', 'Date'], right_index=True)

Output:

    Name Type   Period       Date  value  Expected Result
1   John    A     Noon 2020-01-01      1              NaN
2   John    A  Morning 2020-01-01      2              NaN
3   John    B     Noon 2020-01-04      7              NaN
4   John    B  Morning 2020-01-04      8              NaN
5   John    B     Noon 2020-01-05      9              8.0
6   John    B  Morning 2020-01-05     10              9.0
7   John    A     Noon 2020-01-02      3              2.0
8   John    A  Morning 2020-01-02      4              3.0
9   John    A     Noon 2020-01-03      5              4.0
10  John    B  Morning 2020-01-03      6              8.0
11  John    A     Noon 2020-01-19      5              NaN
12  John    B  Morning 2020-01-31      6              NaN

Upvotes: 2

Related Questions