Reputation: 4753
I've a 1D DataFrame
import pandas as pd
pd.DataFrame(
columns=['A', 'B', 'C', 'D'],
data=[[2.0, 3.0, 0.0, 1.0]],
)
means
A B C D
0 2.0 3.0 0.0 1.0
. I'd like to set the n
smallest values to 0.0
. E.g. with n = 3
I'd like to have
A B C D
0 0.0 3.0 0.0 0.0
. What's the most efficient implementation for my problem w.r.t. execution time and memory consumption?
Upvotes: 0
Views: 48
Reputation: 61910
Use numpy.argpartition:
import numpy as np
import pandas as pd
df = pd.DataFrame(
columns=['A', 'B', 'C', 'D'],
data=[[2.0, 3.0, 0.0, 1.0]],
)
n = 3
indices = np.argpartition(df.squeeze(), n)[:n]
df.iloc[0, indices] = 0
print(df)
Output
A B C D
0 0.0 3.0 0.0 0.0
The function argpartition uses introselect as the selection algorithm. It has a O(n)
worst case performance. From Wikipedia:
is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance
Upvotes: 1
Reputation: 88226
Since nsmallest
is expecting a columns
argument, you could transpose, and the use nsmallest
to index the dataframe with the resulting indices:
df[df.T.squeeze().nsmallest(3).index] = 0.
print(df)
A B C D
0 0.0 3.0 0.0 0.0
Upvotes: 2