Reputation: 97
This seems to be an easy one but I just can't figure it out. So, given the following dataset:
df = pd.DataFrame(np.random.randint(0,100,size=(50, 3)), columns=list('ABC'))
df['Mean'] = df.mean(axis=1)
A B C Mean
0 26 6 73 35.000000
1 89 55 29 57.666667
2 8 89 87 61.333333
3 83 25 64 57.333333
4 35 89 97 73.666667
How can I replace all values within a single row which are higher than the mean column of that row?
Desired output:
A B C Mean
0 26 6 0 35.000000
1 0 55 29 57.666667
2 8 0 0 61.333333
3 0 25 0 57.333333
4 35 0 0 73.666667
I have tried this:
df.apply(lambda x: 0 if x > df['Mean'] else x)
Which results in the ValueError:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index A')
Upvotes: 2
Views: 156
Reputation: 14094
This should work
df[df>df.mean(axis=1).tolist()] = 0
Or if you want to use the mean column
df[df>df['Mean'].tolist()] = 0
Upvotes: 1
Reputation: 1594
i think that could work
for i in range(df.shape[1]):
mask = df.iloc[:,i] > df.Mean
column_name = df.columns[i]
df.loc[mask, column_name] = 0
Upvotes: 0