kingjames23
kingjames23

Reputation: 95

replace just first biggest value of dataframe line in pandas?

i'm having trouble trying to replace(with 0 ) just the first instance of a max value in dataframe. for example:

NAME   1ST MONTH    2ND MONTH   3RD MONTH....
Joe        3            3            2
Erik       5            7            7

I need to replace just the first instance of the max value of every line in the df. The output i need is:

NAME   1ST MONTH    2ND MONTH   3RD MONTH....
Joe        0            3            2
Erik       5            0            7

But i'm using:

df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)

And this gives me the following df:

NAME   1ST MONTH    2ND MONTH   3RD MONTH....
Joe        0            0            2
Erik       5            0            0

Upvotes: 2

Views: 269

Answers (3)

jezrael
jezrael

Reputation: 862601

For improve performance is possible use numpy with numpy.argmax for positions of first max values and then set 0 by indexing:

arr = df.to_numpy()
#oldier pandas versions
#arr = df.values
arr[np.arange(len(df)), np.argmax(arr, axis=1)] = 0
print (arr)
[[0 3 2]
 [5 0 7]]

df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df)
      1ST MONTH  2ND MONTH  3RD MONTH....
NAME                                     
Joe           0          3              2
Erik          5          0              7

#small DataFrame 2k rows
df = pd.concat([df] * 1000, ignore_index=True)


In [174]: %%timeit
     ...: df.apply(lambda x: x.replace(x.nlargest(1), 0), axis=1)
     ...: 
     ...: 
3.11 s ± 311 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [175]: %%timeit
     ...: to_zero = df.idxmax(axis=1).to_dict()
     ...: 
     ...: for idx, col in to_zero.items():
     ...:     df.loc[idx, col] = 0
     ...:     
1.07 s ± 41.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [176]: %%timeit
     ...: arr = df.to_numpy()
     ...: arr[np.arange(len(df)), np.argmax(arr, axis=1)] = 0
     ...: pd.DataFrame(arr, index=df.index, columns=df.columns)
     ...: 
213 µs ± 5.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 0

Abhi
Abhi

Reputation: 4233

You can use nlargest() with replace()

df = pd.DataFrame([[3, 3, 2], [5, 7, 7]], columns=['a', 'b', 'c'])

df = df.apply(lambda x: x.replace(x.nlargest(1), 0), axis=1)

print(df)

    a   b   c
0   0   3   2
1   5   0   7

Upvotes: 4

zipa
zipa

Reputation: 27869

You can go about it like this, hopefully there is more elegant solution:

to_zero = df_temp1._get_numeric_data().idxmax(axis=1).to_dict()

for idx, col in to_zero.items()
    df_temp1.loc[idx, col] = 0

df_temp1

NAME   1ST MONTH    2ND MONTH   3RD MONTH....
Joe        0            3            2
Erik       5            0            7

Upvotes: 1

Related Questions