Reputation: 9
Hello I would like to calculate the average of days column for each Name group at each Time. The average at each Time for each group should be calculated only based on the elements of the rows of the Days which are larger than 0. Any feedback would be highly appreciated ..
Name Time Days Average
John 2021-12-02 0 0
John 2021-12-03 2 0
John 2021-12-05 9 2
John 2021-12-07 0 5.5
John 2021-12-10 10 5.5
Larry 2021-12-02 20 0
Jim 2021-12-09 20 0
Jim 2021-12-10 20 20
Jim 2021-12-12 40 20
Jim 2021-12-12 0 26.6
Juli 2021-11-09 0 0
Juli 2021-11-10 0 0
Juli 2021-11-12 40 0
Juli 2021-11-18 0 40
Juli 2021-11-12 0 40
Juli 2021-11-18 2 40
Juli 2021-11-19 0 21
Upvotes: 1
Views: 74
Reputation: 862481
First replace 0
to missing values and then use GroupBy.transform
with lambda function for Series.expanding
with mean
and Series.shift
, last replace NaN
s to 0
by Series.fillna
:
df['Avg'] = (df.assign(Days = df['Days'].replace(0,np.nan))
.groupby('Name')['Days']
.transform(lambda x: x.expanding().mean().shift())
.fillna(0))
print (df)
Name Time Days Average Avg
0 John 2021-12-02 0 0.0 0.000000
1 John 2021-12-03 2 0.0 0.000000
2 John 2021-12-05 9 2.0 2.000000
3 John 2021-12-07 0 5.5 5.500000
4 John 2021-12-10 10 5.5 5.500000
5 Larry 2021-12-02 20 0.0 0.000000
6 Jim 2021-12-09 20 0.0 0.000000
7 Jim 2021-12-10 20 20.0 20.000000
8 Jim 2021-12-12 40 20.0 20.000000
9 Jim 2021-12-12 0 26.6 26.666667
10 Juli 2021-11-09 0 0.0 0.000000
11 Juli 2021-11-10 0 0.0 0.000000
12 Juli 2021-11-12 40 0.0 0.000000
13 Juli 2021-11-18 0 40.0 40.000000
14 Juli 2021-11-12 0 40.0 40.000000
15 Juli 2021-11-18 2 40.0 40.000000
16 Juli 2021-11-19 0 21.0 21.000000
Upvotes: 1