Reputation: 181
I have the yearly income data of 99 people:
import pandas, random
incomes = pandas.DataFrame({'income':[round(random.triangular(20,80,200),0) for i in range(99)]})
How to:
Sorry, sounds like a newbie question. I'm learning. Thank you!
Upvotes: 2
Views: 4919
Reputation: 88226
To group the column as mentioned, you can use Series.quantile
, which allows to specify a sequence of quantiles. Then use pd.cut
to split the column in bins.
You can then use the "quantile groups" to obtain statistics grouping the dataframe as bellow:
quant = incomes.income.quantile(q=[0,0.33,0.66,1]).values
incomes['groups'] = pd.cut(incomes.income, quant, labels=["poor", "middle", "rich"])
incomes['avg_income'] = incomes.groupby('groups').transform('mean')
Or, as @allolz mentions, you can use qcut
which allows for doing the above in a single step:
incomes['groups'] = pd.qcut(incomes.income, 3, labels=['poor', 'medium', 'rich'])
print(incomes)
income groups avg_income
0 96.0 middle 89.312500
1 77.0 poor 53.531250
2 93.0 middle 89.312500
3 86.0 middle 89.312500
4 59.0 poor 53.531250
.. ... ... ...
94 29.0 poor 53.531250
95 121.0 rich 112.823529
96 87.0 middle 89.312500
97 111.0 rich 112.823529
98 55.0 poor 53.531250
Upvotes: 6