NM2
NM2

Reputation: 127

Conditional aggregation after rolling in pandas

I am trying to calculate a rolling mean of a specific column based on a condition in another column. The condition is to create three different rolling means for column A, as follows -

  1. The rolling mean of A when column B is less than 2
  2. The rolling mean of A when column B is equal to 2
  3. The rolling mean of A when column B is greater than 2

Consider the following df with a window size of 2

    A   B
0   1   2
1   2   4
2   3   4
3   4   6
4   5   1
5   6   2 

The output will be the following-

    rolling less    rolling equal    rolling greater
0   NaN              NaN             NaN     
1   NaN              1               2   
2   NaN              NaN             2.5
3   NaN              NaN             3.5
4   5                NaN             4
5   5                6               NaN

The main difficulty I encountered was that the rolling function is column-wise, and on the other hand, the apply function works rows-wise, but then, calculating the rolling mean is too hard-coded.

Any ideas? Thanks a lot.

Upvotes: 0

Views: 39

Answers (2)

G.G
G.G

Reputation: 765

def function1(ss:pd.Series):
    df11=df1.loc[:ss.name].tail(2)
    return pd.Series([
             df11.loc[lambda dd:dd.B<2,'A'].mean()
            ,df11.loc[lambda dd:dd.B==2,'A'].mean()
            ,df11.loc[lambda dd:dd.B>2,'A'].mean()
         ],index=['rolling less','rolling equal','rolling greater'],name=ss.name)

pd.concat([df1.A.shift(i) for i in range(2)],axis=1)\
    .apply(function1,axis=1)
        
        
          A  B  rolling less  rolling equal  rolling greater
        0  1  2           NaN            1.0              NaN
        1  2  4           NaN            1.0              2.0
        2  3  4           NaN            NaN              2.5
        3  4  6           NaN            NaN              3.5
        4  5  1           5.0            NaN              4.0
        5  6  2           5.0            6.0              NaN

Upvotes: 0

Corralien
Corralien

Reputation: 120499

You can create your 3 columns before rolling then compute it:

out = df.join(df.assign(rolling_less=df.mask(df['B'] >= 2)['A'],
                        rolling_equal=df.mask(df['B'] != 2)['A'],
                        rolling_greater=df.mask(df['B'] <= 2)['A'])
                .filter(like='rolling').rolling(2, min_periods=1).mean())
print(out)

# Output
   A  B  rolling_less  rolling_equal  rolling_greater
0  1  2           NaN            1.0              NaN
1  2  4           NaN            1.0              2.0
2  3  4           NaN            NaN              2.5
3  4  6           NaN            NaN              3.5
4  5  1           5.0            NaN              4.0
5  6  2           5.0            6.0              NaN

Upvotes: 1

Related Questions