Tomi Gelo
Tomi Gelo

Reputation: 97

Replace values in rows higher than a column value

This seems to be an easy one but I just can't figure it out. So, given the following dataset:

df = pd.DataFrame(np.random.randint(0,100,size=(50, 3)), columns=list('ABC'))
df['Mean'] = df.mean(axis=1)

    A   B   C   Mean
0   26  6   73  35.000000
1   89  55  29  57.666667
2   8   89  87  61.333333
3   83  25  64  57.333333
4   35  89  97  73.666667

How can I replace all values within a single row which are higher than the mean column of that row?

Desired output:

    A   B   C   Mean
0   26  6   0   35.000000
1   0   55  29  57.666667
2   8   0   0   61.333333
3   0   25  0   57.333333
4   35  0   0   73.666667

I have tried this:

df.apply(lambda x: 0 if x > df['Mean'] else x)

Which results in the ValueError:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index A')

Upvotes: 2

Views: 156

Answers (2)

Kenan
Kenan

Reputation: 14094

This should work

df[df>df.mean(axis=1).tolist()] = 0

Or if you want to use the mean column

df[df>df['Mean'].tolist()] = 0

Upvotes: 1

nimrodz
nimrodz

Reputation: 1594

i think that could work

for i in range(df.shape[1]):
    mask = df.iloc[:,i] > df.Mean
    column_name = df.columns[i]
    df.loc[mask, column_name] = 0

Upvotes: 0

Related Questions