Reputation: 5294
I'm pretty new to Pandas and unfortunately at the moment I don't have much time to dig into it as I would like.
I have a dataframe like this:
x y z class id other-numeric-field
0 8 8 5 1 1014f 0.388640
1 2 3 4 0 3ba1d 0.431008
2 5 1 6 1 1014f 0.388640
3 7 9 6 1 1014f 0.388640
4 6 9 1 0 7a5d7 0.476972
I'd like to replace all rows with the same class
with their mean over ['x', 'y', 'z']
columns.
Dataframe can contain other columns, numeric or not, which are usually all equal over the same class but that I don't really care to lose if they are not. I could keep the first occurrence or just average over them too if it works with non numeric field also.
Upvotes: 0
Views: 1189
Reputation: 30605
You might be looking for agg
i.e
ndf = df.groupby('class').agg({'x':'mean','y':'mean','z':'mean','id':'first','other-numeric-field':'first'})
id other-numeric-field x z y
class
0 3ba1d 0.431008 4.000000 2.500000 6
1 1014f 0.388640 6.666667 5.666667 6
To apply this only for class zero, one approach is appending i.e
ndf = df.groupby('class',as_index=False).agg({'x':'mean','y':'mean','z':'mean','id':'first','other-numeric-field':'first'})
sdf = df[df['class'].ne(0)].append(ndf[ndf['class'].eq(0)],ignore_index=True)
class id other-numeric-field x y z
0 1 1014f 0.388640 8.0 8 5.0
1 1 1014f 0.388640 5.0 1 6.0
2 1 1014f 0.388640 7.0 9 6.0
3 0 3ba1d 0.431008 4.0 6 2.5
Upvotes: 4
Reputation: 210832
Is that what you want?
In [18]: df[['x','y','z']] = df.groupby('class')[['x','y','z']].transform('mean')
In [19]: df
Out[19]:
x y z class id other-numeric-field
0 6.666667 6 5.666667 1 1014f 0.388640
1 4.000000 6 2.500000 0 3ba1d 0.431008
2 6.666667 6 5.666667 1 1014f 0.388640
3 6.666667 6 5.666667 1 1014f 0.388640
4 4.000000 6 2.500000 0 7a5d7 0.476972
Upvotes: 5