thinwybk
thinwybk

Reputation: 4753

How to set the n smallest values of a 1D DataFrame to a specific value?

I've a 1D DataFrame

import pandas as pd

pd.DataFrame(
    columns=['A', 'B', 'C', 'D'],
    data=[[2.0, 3.0, 0.0, 1.0]],
)

means

  A   B   C   D
0 2.0 3.0 0.0 1.0

. I'd like to set the n smallest values to 0.0. E.g. with n = 3 I'd like to have

  A   B   C   D
0 0.0 3.0 0.0 0.0

. What's the most efficient implementation for my problem w.r.t. execution time and memory consumption?

Upvotes: 0

Views: 48

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

Use numpy.argpartition:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    columns=['A', 'B', 'C', 'D'],
    data=[[2.0, 3.0, 0.0, 1.0]],
)

n = 3
indices = np.argpartition(df.squeeze(), n)[:n]
df.iloc[0, indices] = 0

print(df)

Output

     A    B    C    D
0  0.0  3.0  0.0  0.0

The function argpartition uses introselect as the selection algorithm. It has a O(n) worst case performance. From Wikipedia:

is a selection algorithm that is a hybrid of quickselect and median of medians which has fast average performance and optimal worst-case performance

Upvotes: 1

yatu
yatu

Reputation: 88226

Since nsmallest is expecting a columns argument, you could transpose, and the use nsmallest to index the dataframe with the resulting indices:

df[df.T.squeeze().nsmallest(3).index] = 0.

print(df)

   A    B    C    D
0  0.0  3.0  0.0  0.0
​

Upvotes: 2

Related Questions