Morticia A. Addams
Morticia A. Addams

Reputation: 383

Add column with average value grouped by column

I want to replace column value of dataframe with mean(without zeros) value of column grouped by another column.

Dataframe df is like:

ID | TYPE | rate
-------------
1  |    A | 0 <- Replace this
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 0 <- Replace this
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Replace this

I have to replace values in rating where rating = 0:

df['rate'][df['rate']==0] = ?

with average value for that TYPE.

Average(without zeros) value for every type is:

A = 2/1 = 2

B = 2/1 = 2

C = (1 + 1 + 8 + 2)/4 = 3

D = 0 (default value when there isn't information for type)

Expected result:

ID | TYPE | rate
-------------
1  |    A | 2 <- Changed
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 3 <- Changed
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Changed

Upvotes: 3

Views: 172

Answers (1)

yatu
yatu

Reputation: 88236

You could mask the rate column in the dataframe, GroupBy the TYPE and transform with the mean, which will exlude NaNs. The use fillna to replace the values in the masked dataframe:

ma = df.rate.mask(df.rate.eq(0))
df['rate'] = ma.fillna(ma.groupby(df.TYPE).transform('mean').fillna(0))

   ID  TYPE  rate
0   1    A   2.0
1   2    B   2.0
2   3    C   1.0
3   4    A   2.0
4   5    C   1.0
5   6    C   3.0
6   7    C   8.0
7   8    C   2.0
8   9    D   0.0

Upvotes: 3

Related Questions