elemolotiv
elemolotiv

Reputation: 181

Pandas: group by quantiles and calculate stats

I have the yearly income data of 99 people:

import pandas, random
incomes = pandas.DataFrame({'income':[round(random.triangular(20,80,200),0) for i in range(99)]}) 

How to:

Sorry, sounds like a newbie question. I'm learning. Thank you!

Upvotes: 2

Views: 4919

Answers (1)

yatu
yatu

Reputation: 88226

To group the column as mentioned, you can use Series.quantile, which allows to specify a sequence of quantiles. Then use pd.cut to split the column in bins.

You can then use the "quantile groups" to obtain statistics grouping the dataframe as bellow:

quant = incomes.income.quantile(q=[0,0.33,0.66,1]).values
incomes['groups'] = pd.cut(incomes.income, quant, labels=["poor", "middle", "rich"])
incomes['avg_income'] = incomes.groupby('groups').transform('mean')

Or, as @allolz mentions, you can use qcut which allows for doing the above in a single step:

incomes['groups'] = pd.qcut(incomes.income, 3, labels=['poor', 'medium', 'rich'])

print(incomes)

    income  groups  avg_income
0     96.0  middle   89.312500
1     77.0    poor   53.531250
2     93.0  middle   89.312500
3     86.0  middle   89.312500
4     59.0    poor   53.531250
..     ...     ...         ...
94    29.0    poor   53.531250
95   121.0    rich  112.823529
96    87.0  middle   89.312500
97   111.0    rich  112.823529
98    55.0    poor   53.531250

Upvotes: 6

Related Questions