Reputation: 95
i'm having trouble trying to replace(with 0 ) just the first instance of a max value in dataframe. for example:
NAME 1ST MONTH 2ND MONTH 3RD MONTH....
Joe 3 3 2
Erik 5 7 7
I need to replace just the first instance of the max value of every line in the df. The output i need is:
NAME 1ST MONTH 2ND MONTH 3RD MONTH....
Joe 0 3 2
Erik 5 0 7
But i'm using:
df_temp1.apply(lambda x: x.replace(max(x), 0), axis = 1)
And this gives me the following df:
NAME 1ST MONTH 2ND MONTH 3RD MONTH....
Joe 0 0 2
Erik 5 0 0
Upvotes: 2
Views: 269
Reputation: 862601
For improve performance is possible use numpy
with numpy.argmax
for positions of first max values and then set 0
by indexing:
arr = df.to_numpy()
#oldier pandas versions
#arr = df.values
arr[np.arange(len(df)), np.argmax(arr, axis=1)] = 0
print (arr)
[[0 3 2]
[5 0 7]]
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df)
1ST MONTH 2ND MONTH 3RD MONTH....
NAME
Joe 0 3 2
Erik 5 0 7
#small DataFrame 2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [174]: %%timeit
...: df.apply(lambda x: x.replace(x.nlargest(1), 0), axis=1)
...:
...:
3.11 s ± 311 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [175]: %%timeit
...: to_zero = df.idxmax(axis=1).to_dict()
...:
...: for idx, col in to_zero.items():
...: df.loc[idx, col] = 0
...:
1.07 s ± 41.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [176]: %%timeit
...: arr = df.to_numpy()
...: arr[np.arange(len(df)), np.argmax(arr, axis=1)] = 0
...: pd.DataFrame(arr, index=df.index, columns=df.columns)
...:
213 µs ± 5.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 0
Reputation: 4233
You can use nlargest()
with replace()
df = pd.DataFrame([[3, 3, 2], [5, 7, 7]], columns=['a', 'b', 'c'])
df = df.apply(lambda x: x.replace(x.nlargest(1), 0), axis=1)
print(df)
a b c
0 0 3 2
1 5 0 7
Upvotes: 4
Reputation: 27869
You can go about it like this, hopefully there is more elegant solution:
to_zero = df_temp1._get_numeric_data().idxmax(axis=1).to_dict()
for idx, col in to_zero.items()
df_temp1.loc[idx, col] = 0
df_temp1
NAME 1ST MONTH 2ND MONTH 3RD MONTH....
Joe 0 3 2
Erik 5 0 7
Upvotes: 1