Rock
Rock

Reputation: 2977

Compute row average by another column in Python/Pandas

I have a data frame as below:

user | profit
-------------
Anna |    1.0
Bell |    2.0
Anna |    2.0
Chad |    5.0
Bell |    4.0
Anna |    3.0

that I need to compute each row's mean value on the users' level, that is, each time I see the same user I compute his/her profit mean thus far.

For instance, Anna's first profit mean is 1.0 and her second profit mean becomes 1.5, and so on.

The desired result looks like:

user | profit | mean
--------------------
Anna |    1.0 |  1.0
Bell |    2.0 |  2.0
Anna |    2.0 |  1.5
Chad |    5.0 |  5.0
Bell |    4.0 |  3.0
Anna |    3.0 |  2.0

Any suggestions to do so in Python/Pandas?

import pandas as pd

record = pd.DataFrame({
    "user": ("Anna", "Bell", "Anna", "Chad", "Bell", "Anna"), 
    "profit": (1.0, 2.0, 2.0, 5.0, 4.0, 3.0)
})

Thanks!

Upvotes: 1

Views: 96

Answers (2)

Anton vBR
Anton vBR

Reputation: 18924

I think we can use cumsum() and divide with the count so far.

g = df.groupby('user')['profit']
df['mean'] = g.cumsum() / (g.cumcount() + 1)

Full example

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "user": ("Anna", "Bell", "Anna", "Chad", "Bell", "Anna"), 
    "profit": (1.0, 2.0, 2.0, 5.0, 4.0, 3.0)
})

g = df.groupby('user')['profit']
df['mean'] = g.cumsum() / (g.cumcount() + 1)

print(df)

Returns:

   user  profit  mean
0  Anna     1.0   1.0
1  Bell     2.0   2.0
2  Anna     2.0   1.5
3  Chad     5.0   5.0
4  Bell     4.0   3.0
5  Anna     3.0   2.0

Upvotes: 1

jezrael
jezrael

Reputation: 863801

Use GroupBy.transform with rolling and mean:

df['mean'] = (df.groupby('user')['profit']
                .transform(lambda x: x.rolling(len(x), min_periods=1).mean()))
print (df)
   user  profit  mean
0  Anna     1.0   1.0
1  Bell     2.0   2.0
2  Anna     2.0   1.5
3  Chad     5.0   5.0
4  Bell     4.0   3.0
5  Anna     3.0   2.0

Upvotes: 2

Related Questions