Obtain observations based on percentile value in python pandas

Question

I have a data frame in the following form:

d1 = {'City_ID': ['City_1','City_1','City_1','City_1','City_2','City_3','City_3','City_3','City_3','City_3'], 
'Indiv_ID': ['Indiv_1','Indiv_2','Indiv_3','Indiv_4','Indiv_5','Indiv_6','Indiv_7','Indiv_8','Indiv_9','Indiv_10'],
'Expenditure_by_earning': [0.11, 0.66, 0.51, 0.43, 0.46,0.8, 0.14, 0.06, 0.64, 0.95]}

The real dataset contains over a 1000 cities with multiple individuals although some cities contain only 1 observation. I would like to obtain individuals across each city whose expenditure by earning value is less than the 25% percentile and greater than 75% percentile for that city.

The output in this case I would expect:

City_ID     Indiv_ID    Expenditure_by_earning     Percentile
City_1      Indiv_1          0.11                      25
City_1      Indiv_2          0.66                      75
City_3      Indiv_7          0.06                      25
City_3      Indiv_8          0.14                      25
City_3      Indiv_6          0.8                       75
City_3      Indiv_10         0.95                      75

Note: City 2 gets eliminated.

Would someone help me on how to achieve this using python? Thanks.

Max Power · Accepted Answer

# Calculate quantiles by city (result is indexed by city)
q25 = d1.groupby('City_ID')['Expenditure_by_earning'].quantile(.25)
q75 = d1.groupby('City_ID')['Expenditure_by_earning'].quantile(.75)

# Calculate Residuals Above Percentile Levels
# (First set d1 Index on CityID (like q25/q75), allowing for direct subtraction)
d1 = d1.set_index('City_ID')
d1['Pct_75_resid'] = d1['Expenditure_by_earning'] - q75
d1['Pct_25_resid'] = d1['Expenditure_by_earning'] - q25

# Filter
d1.query('Pct_75_resid >= 0 or Pct_25_resid <=0')

Obtain observations based on percentile value in python pandas

Answers (2)

Related Questions