Reputation: 2977
I have a data frame as below:
user | profit
-------------
Anna | 1.0
Bell | 2.0
Anna | 2.0
Chad | 5.0
Bell | 4.0
Anna | 3.0
that I need to compute each row's mean value on the users' level, that is, each time I see the same user I compute his/her profit mean thus far.
For instance, Anna's first profit mean is 1.0 and her second profit mean becomes 1.5, and so on.
The desired result looks like:
user | profit | mean
--------------------
Anna | 1.0 | 1.0
Bell | 2.0 | 2.0
Anna | 2.0 | 1.5
Chad | 5.0 | 5.0
Bell | 4.0 | 3.0
Anna | 3.0 | 2.0
Any suggestions to do so in Python/Pandas?
import pandas as pd
record = pd.DataFrame({
"user": ("Anna", "Bell", "Anna", "Chad", "Bell", "Anna"),
"profit": (1.0, 2.0, 2.0, 5.0, 4.0, 3.0)
})
Thanks!
Upvotes: 1
Views: 96
Reputation: 18924
I think we can use cumsum() and divide with the count so far.
g = df.groupby('user')['profit']
df['mean'] = g.cumsum() / (g.cumcount() + 1)
Full example
import pandas as pd
import numpy as np
df = pd.DataFrame({
"user": ("Anna", "Bell", "Anna", "Chad", "Bell", "Anna"),
"profit": (1.0, 2.0, 2.0, 5.0, 4.0, 3.0)
})
g = df.groupby('user')['profit']
df['mean'] = g.cumsum() / (g.cumcount() + 1)
print(df)
Returns:
user profit mean
0 Anna 1.0 1.0
1 Bell 2.0 2.0
2 Anna 2.0 1.5
3 Chad 5.0 5.0
4 Bell 4.0 3.0
5 Anna 3.0 2.0
Upvotes: 1
Reputation: 863801
Use GroupBy.transform
with rolling
and mean
:
df['mean'] = (df.groupby('user')['profit']
.transform(lambda x: x.rolling(len(x), min_periods=1).mean()))
print (df)
user profit mean
0 Anna 1.0 1.0
1 Bell 2.0 2.0
2 Anna 2.0 1.5
3 Chad 5.0 5.0
4 Bell 4.0 3.0
5 Anna 3.0 2.0
Upvotes: 2