Reputation: 732
I want to replace maximum value for each row in the column with mean
value for this row. the method i am using taking a lot of time for complete. i am using pandas DataFrame
. replaced mean value need to be an integer but with correct flood division.example: if value is 3.2 then 3 or if value is 3.8 then 4.
my slow solution:
for j in range(0,len(df_train)):
val = df_train.iloc[j,1:51].mean()
m = df_train.iloc[j,1:51].max()
df_train.iloc[j,1:51] = df_train.iloc[j,1:51].replace(m,int(val))
My DataFrame:
id | feature0 | feature1 | feature2 | feature3 | feature4 |
---|---|---|---|---|---|
0 | 0 | 0 | 3 | 1 | 5 |
1 | 4 | 0 | 4 | 0 | 8 |
2 | 1 | 21 | 4 | 0 | 0 |
3 | 0 | 11 | 0 | 0 | 2 |
Output i want:
id | feature0 | feature1 | feature2 | feature3 | feature4 |
---|---|---|---|---|---|
0 | 0 | 0 | 3 | 1 | 2 |
1 | 4 | 0 | 4 | 0 | 3 |
2 | 1 | 5 |
4 | 0 | 0 |
3 | 0 | 3 |
0 | 0 | 2 |
Upvotes: 2
Views: 2087
Reputation: 41327
Do you happen to know if there is a way to do it over
df
itself (instead of thedf.values
numpy array)?
Use DataFrame.mask
:
df = df.mask(
df.eq(df.max(axis=1), axis=0), # the mask (True locations will get replaced)
df.mean(axis=1).round(), # the replacements
axis=0) # replace by rows (each replacement value corresponds to one mask row)
# feature0 feature1 feature2 feature3 feature4
# 0 0 0 3 1 2
# 1 4 0 4 0 3
# 2 1 5 4 0 0
# 3 0 3 0 0 2
Advantages of DataFrame.mask
:
For reference, the boolean mask:
df.eq(df.max(axis=1), axis=0)
# feature0 feature1 feature2 feature3 feature4
# 0 False False False False True
# 1 False False False False True
# 2 False True False False False
# 3 False True False False False
Note: To replace the column max by column mean, just swap all the axis
params:
df.mask(
df.eq(df.max(axis=0), axis=1),
df.mean(axis=0).round(),
axis=1)
# feature0 feature1 feature2 feature3 feature4
# 0 0 0 3 1 5
# 1 1 0 4 0 4
# 2 1 8 4 0 0
# 3 0 11 0 0 2
Upvotes: 2
Reputation: 18306
df.values[range(len(df.index)), np.argmax(df.values, axis=1)] = df.mean(axis=1).round()
np.argmax
over the rows tells us position of each maximum value per row. Then we use fancy indexing into df.values
and assign the mean
values over the rows (axis=1
) but round
ed.
to get
feature0 feature1 feature2 feature3 feature4
id
0 0 0 3 1 2
1 4 0 4 0 3
2 1 5 4 0 0
3 0 3 0 0 2
Upvotes: 3