Reputation: 429
I am trying to write a code to categorize a data set (from a CSV) based on the value range using the python pandas library. Aggregate functions can be used. But I am struggling on using aggregate functions.
+-------------+-------------+-------------+-------------+-------------+
|Name | Age |Region |Telephone |Address |
+-------------+-------------+-------------+-------------+-------------+
| | | | | |
I could develop the following code.
import pandas as pd
data_frame = pd.read_csv('5000 Records.csv')
data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
bins=[-float('inf'),30,50,float('inf')],
labels=['above', 'in between', 'below'])
data_frame = data_frame.groupby(['Region','age_range']).agg(
{
'age_range': "count"
}
)
print(data_frame)
but the result is as follows
age_range
Region age_range
Midwest above 312
in between 695
below 390
Northeast above 201
in between 421
below 219
South above 435
in between 983
below 452
West above 211
in between 443
below 238
But the requirement is to get the output as:
+-------------+-------------+-------------+-------------+
|Region | above |in between |below |
+-------------+-------------+-------------+-------------+
| | | | |
Can somebody please help me in doing this? Thanx in advance!
Upvotes: 1
Views: 232
Reputation: 862406
Use Series.unstack
with simplify groupby
solution - removed agg
and added GroupBy.size
.
GroupBy.count
is used for count with exclude missing values, here both solution working same, because age_range
is used for by
parameter in groupby
:
df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)
Or use crosstab
:
df = pd.crosstab(data_frame['Region'], data_frame['age_range'])
Upvotes: 2
Reputation: 3598
Try DataFrame.pivot
method:
data_frame.pivot(index='Region', columns='age_range', values='count')
Upvotes: 0