Reputation: 1
I have a dataframe column which is a set of number in descending order and I need to assign the lowest %10 to a new dataframe. But I couldn't find a way to extract the lowest %10. Thanks in advance.
First function I've tried is percentile
function of numpy.
import numpy as np
import pandas as pd
df['Column']` #which has 2400 number
array1 = np.array(df['Column'])
np.percentile(array1,10)` #gave me the variable which is the %10 (just 1 variable) but I need the list of lowest %10
Second code I've tried is cut
function of pandas
pd.qcut(df['Column'], q =10) # divides the dataframe to 10 equal piece. But I couldn't find a way to extract lowest %10
Upvotes: -1
Views: 2896
Reputation: 2905
If what you need is to get the rows that satisfy this condition, you can do this with simple slicing. Let's walk through it:
df['Column'].quantile(0.1)
df['Column'].le(df['Column'].quantile(0.1))
(or equivalently, df['Column'] <= df['Column'].quantile(0.1)
).True
/False
where the values match / don't match the condition. Such a series can be passed as index to the df to filter only the desired rows. To sum it up, what you want is:
df_2 = df[df['Column'].le(df['Column'].quantile(0.1))]
EDITED: For the top 10%, similarly use
df_2 = df[df['Column'].ge(df['Column'].quantile(0.9))]
EDITED (again, as per comment by OP):
If you need to get an exact number (e.g. exactly 10% of your dataset, regardless of duplicate values), you can sort the dataframe by the relevant column and pick the top/bottom n values (where n might be, for example, df.shape[0]//10), like this:
df_2 = df.sort_values('Column').tail(df.shape[0]//10) # top 10%
df_2 = df.sort_values('Column').head(df.shape[0]//10) # bottom 10%
Upvotes: 4