SOK
SOK

Reputation: 1792

How to perform a conditional groupby calculation in pandas and forward fill zero values

I have the following dataframe:

import pandas as pd
#Create DF
d = { 
     'Date': ['1/01/2021','2/01/2021','3/01/2021','4/01/2021','5/01/2021','6/01/2021','7/01/2021','8/01/2021','9/01/2021','10/01/2021','11/01/2021','12/01/2021','13/01/2021',
'14/01/2021','15/01/2021','16/01/2021'],
    'Name': ['Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','John','John','John','John','John','John','John'],
    'Criteria':[30,35,2.5,3,6,15,20,1.5,4,25,50,75,2,4,9,2],
    'Sum' : [-1,-1,1.5,-1,5,-1,-1,-1,3,-1,-1,-1,-1,3,-1,-1],
    'Sum Divided by Count where Criteria <5' : [0.000,0.000,1.500,0.250,0.000,0.000,0.000,-0.167,0.625,0.000,0.000,0.000,-1.000,1.000,0.000,0.333]
    
    }

              
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df.Date,format='%d/%m/%Y')
df

enter image description here

The last column is calculating the sum of the column Sum where Criteria is less than 5 divided by the count of rows where Criteria is less than 5. And grouped by Name. The red is hihglighting the rows that match the criteria.

What i am also trying to do is also add a forward fill to all rows but keep the initial rows as zeros.

Is all of the above possible in pandas?

My end result should be:

enter image description here

Any help would be much apreciated!

Upvotes: 2

Views: 139

Answers (1)

jezrael
jezrael

Reputation: 863166

Repalce values below 5 to missing values in Sum in DataFrame.assign and then per groups call Expanding.mean for divide sum by counts and last if values not match repalce them to 0 in Series.where:

m = df['Criteria'] < 5
df['new'] = (df.assign(Sum = df['Sum'].where(m))
               .groupby('Name')['Sum']
               .expanding()
               .mean()
               .reset_index(level=0, drop=True)
               .where(m)
               .groupby(df['Name'])
               .ffill()
               .fillna(0))

print (df)
         Date  Name  Criteria  Sum  Sum Divided by Count where Criteria <5  \
0  2021-01-01   Joe      30.0 -1.0                                   0.000   
1  2021-01-02   Joe      35.0 -1.0                                   0.000   
2  2021-01-03   Joe       2.5  1.5                                   1.500   
3  2021-01-04   Joe       3.0 -1.0                                   0.250   
4  2021-01-05   Joe       6.0  5.0                                   0.000   
5  2021-01-06   Joe      15.0 -1.0                                   0.000   
6  2021-01-07   Joe      20.0 -1.0                                   0.000   
7  2021-01-08   Joe       1.5 -1.0                                  -0.167   
8  2021-01-09   Joe       4.0  3.0                                   0.625   
9  2021-01-10  John      25.0 -1.0                                   0.000   
10 2021-01-11  John      50.0 -1.0                                   0.000   
11 2021-01-12  John      75.0 -1.0                                   0.000   
12 2021-01-13  John       2.0 -1.0                                  -1.000   
13 2021-01-14  John       4.0  3.0                                   1.000   
14 2021-01-15  John       9.0 -1.0                                   0.000   
15 2021-01-16  John       2.0 -1.0                                   0.333   

         new  
0   0.000000  
1   0.000000  
2   1.500000  
3   0.250000  
4   0.250000  
5   0.250000  
6   0.250000  
7  -0.166667  
8   0.625000  
9   0.000000  
10  0.000000  
11  0.000000  
12 -1.000000  
13  1.000000  
14  1.000000  
15  0.333333  

Upvotes: 1

Related Questions