Pandas, replace rows by mean over given columns

Question

I'm pretty new to Pandas and unfortunately at the moment I don't have much time to dig into it as I would like.

I have a dataframe like this:

   x  y  z  class     id  other-numeric-field
0  8  8  5      1  1014f             0.388640
1  2  3  4      0  3ba1d             0.431008
2  5  1  6      1  1014f             0.388640
3  7  9  6      1  1014f             0.388640
4  6  9  1      0  7a5d7             0.476972

I'd like to replace all rows with the same class with their mean over ['x', 'y', 'z'] columns.

Dataframe can contain other columns, numeric or not, which are usually all equal over the same class but that I don't really care to lose if they are not. I could keep the first occurrence or just average over them too if it works with non numeric field also.

Bharath M Shetty · Accepted Answer

You might be looking for agg i.e

ndf = df.groupby('class').agg({'x':'mean','y':'mean','z':'mean','id':'first','other-numeric-field':'first'})

          id  other-numeric-field         x         z  y
class                                                   
0      3ba1d             0.431008  4.000000  2.500000  6
1      1014f             0.388640  6.666667  5.666667  6

To apply this only for class zero, one approach is appending i.e

ndf = df.groupby('class',as_index=False).agg({'x':'mean','y':'mean','z':'mean','id':'first','other-numeric-field':'first'})

sdf = df[df['class'].ne(0)].append(ndf[ndf['class'].eq(0)],ignore_index=True)

 class     id  other-numeric-field    x  y    z
0      1  1014f             0.388640  8.0  8  5.0
1      1  1014f             0.388640  5.0  1  6.0
2      1  1014f             0.388640  7.0  9  6.0
3      0  3ba1d             0.431008  4.0  6  2.5

Pandas, replace rows by mean over given columns

Answers (2)

Related Questions