samyil
samyil

Reputation: 1

Python: Extracting the lower quantile from a DataFrame

I have a dataframe column which is a set of number in descending order and I need to assign the lowest %10 to a new dataframe. But I couldn't find a way to extract the lowest %10. Thanks in advance.

First function I've tried is percentile function of numpy.

import numpy as np
import pandas as pd


df['Column']` #which has 2400 number

array1 = np.array(df['Column'])

np.percentile(array1,10)` #gave me the variable which is the %10 (just 1 variable) but I need the list of lowest %10

Second code I've tried is cut function of pandas

pd.qcut(df['Column'], q =10) # divides the dataframe to 10 equal piece. But I couldn't find a way to extract lowest %10 

Upvotes: -1

Views: 2896

Answers (1)

Itamar Mushkin
Itamar Mushkin

Reputation: 2905

If what you need is to get the rows that satisfy this condition, you can do this with simple slicing. Let's walk through it:

  1. To get the 10% quantile threshold, use df['Column'].quantile(0.1)
  2. To get the rows where this column is below (or equal) to this threshold, use df['Column'].le(df['Column'].quantile(0.1)) (or equivalently, df['Column'] <= df['Column'].quantile(0.1)).
  3. The previous expression gave a series with an index matching the df's index and values of True/False where the values match / don't match the condition. Such a series can be passed as index to the df to filter only the desired rows.

To sum it up, what you want is:

df_2 = df[df['Column'].le(df['Column'].quantile(0.1))]

EDITED: For the top 10%, similarly use

df_2 = df[df['Column'].ge(df['Column'].quantile(0.9))]

EDITED (again, as per comment by OP):

If you need to get an exact number (e.g. exactly 10% of your dataset, regardless of duplicate values), you can sort the dataframe by the relevant column and pick the top/bottom n values (where n might be, for example, df.shape[0]//10), like this:

df_2 = df.sort_values('Column').tail(df.shape[0]//10) # top 10%
df_2 = df.sort_values('Column').head(df.shape[0]//10) # bottom 10%

Upvotes: 4

Related Questions