Saji Xavier
Saji Xavier

Reputation: 2360

Filter out data between two percentiles in python pandas

I have pandas Dataframe, i want to eliminate extreme values for a column. eg: I have pandas data frame called df, and have column called percentage in it. I want to filter out the data frame based on the following condition, eliminate first 10 percentile and last 10 percentile based on values in percentage column.

I want to filter it to 10th to 90th percentile.

I thought of the following,

df[(df.percentage > np.percentile(df.percentage, 10 )) & (df.percentage < np.percentile(df.percentage, 90 ))]

Is it the right approach ? or can someone recommend faster approach if any ?

Upvotes: 2

Views: 4682

Answers (2)

flurble
flurble

Reputation: 1106

The whole dataframe should be filtered on the percentiles of df['percentage']?

import numpy as np
import pandas as pd

df = pd.DataFrame({'percentage': abs(np.random.normal(loc=50, scale=30, size=100)),
                   'var1': np.random.rand(100),
                   'var2': np.random.rand(100),
                   'var3': np.random.rand(100)})

# Find out percentiles
lower = np.percentile(df['percentage'], 10)
upper = np.percentile(df['percentage'], 90)

# Select data between
trimmed = df[df.percentage.between(lower, upper)]

Upvotes: 4

ResidentSleeper
ResidentSleeper

Reputation: 2495

Try .quantile

p_10 = df.percentage.quantile(0.1)
p_90 = df.percentage.quantile(0.9)
df[df.percentage.gt(p_10) & df.percentage.lt(p_90)]

Upvotes: 6

Related Questions