Sudhanshu
Sudhanshu

Reputation: 732

Replace maximum value of each row with mean

I want to replace maximum value for each row in the column with mean value for this row. the method i am using taking a lot of time for complete. i am using pandas DataFrame. replaced mean value need to be an integer but with correct flood division.example: if value is 3.2 then 3 or if value is 3.8 then 4.

my slow solution:

for j in range(0,len(df_train)):
    val = df_train.iloc[j,1:51].mean()
    m = df_train.iloc[j,1:51].max()
    df_train.iloc[j,1:51] = df_train.iloc[j,1:51].replace(m,int(val)) 

My DataFrame:

id feature0 feature1 feature2 feature3 feature4
0 0 0 3 1 5
1 4 0 4 0 8
2 1 21 4 0 0
3 0 11 0 0 2

Output i want:

id feature0 feature1 feature2 feature3 feature4
0 0 0 3 1 2
1 4 0 4 0 3
2 1 5 4 0 0
3 0 3 0 0 2

Upvotes: 2

Views: 2087

Answers (2)

tdy
tdy

Reputation: 41327

Do you happen to know if there is a way to do it over df itself (instead of the df.values numpy array)?

Use DataFrame.mask:

df = df.mask(
    df.eq(df.max(axis=1), axis=0), # the mask (True locations will get replaced)
    df.mean(axis=1).round(),       # the replacements
    axis=0)                        # replace by rows (each replacement value corresponds to one mask row)

#    feature0  feature1  feature2  feature3  feature4
# 0         0         0         3         1         2
# 1         4         0         4         0         3
# 2         1         5         4         0         0
# 3         0         3         0         0         2

Advantages of DataFrame.mask:

  • can handle ties (whereas the numpy approach will only replace the first one if tied)
  • can chain with other methods (whereas the numpy approach forces you to modify in place)

For reference, the boolean mask:

df.eq(df.max(axis=1), axis=0)

#    feature0  feature1  feature2  feature3  feature4
# 0     False     False     False     False      True
# 1     False     False     False     False      True
# 2     False      True     False     False     False
# 3     False      True     False     False     False

Note: To replace the column max by column mean, just swap all the axis params:

df.mask(
    df.eq(df.max(axis=0), axis=1),
    df.mean(axis=0).round(),
    axis=1)

#    feature0  feature1  feature2  feature3  feature4
# 0         0         0         3         1         5
# 1         1         0         4         0         4
# 2         1         8         4         0         0
# 3         0        11         0         0         2

Upvotes: 2

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

df.values[range(len(df.index)), np.argmax(df.values, axis=1)] = df.mean(axis=1).round()

np.argmax over the rows tells us position of each maximum value per row. Then we use fancy indexing into df.values and assign the mean values over the rows (axis=1) but rounded.

to get

    feature0  feature1  feature2  feature3  feature4
id
0          0         0         3         1         2
1          4         0         4         0         3
2          1         5         4         0         0
3          0         3         0         0         2

Upvotes: 3

Related Questions