Indi
Indi

Reputation: 429

How to categorize values of data frame in pandas?

I am trying to write a code to categorize a data set (from a CSV) based on the value range using the python pandas library. Aggregate functions can be used. But I am struggling on using aggregate functions.

    +-------------+-------------+-------------+-------------+-------------+
    |Name         | Age         |Region       |Telephone    |Address      |
    +-------------+-------------+-------------+-------------+-------------+
    |             |             |             |             |             |

I could develop the following code.

import pandas as pd

data_frame = pd.read_csv('5000 Records.csv')

data_frame['age_range'] = pd.cut(data_frame['Age in Yrs.'],
                             bins=[-float('inf'),30,50,float('inf')],
                             labels=['above', 'in between', 'below'])

data_frame = data_frame.groupby(['Region','age_range']).agg(
    {
        'age_range': "count"
    }
)

print(data_frame)

but the result is as follows

                      age_range
Region    age_range            
Midwest   above             312
          in between        695
          below             390
Northeast above             201
          in between        421
          below             219
South     above             435
          in between        983
          below             452
West      above             211
          in between        443
          below             238

But the requirement is to get the output as:

+-------------+-------------+-------------+-------------+
|Region       | above         |in between |below        |
+-------------+-------------+-------------+-------------+
|             |             |             |             | 

Can somebody please help me in doing this? Thanx in advance!

Upvotes: 1

Views: 232

Answers (2)

jezrael
jezrael

Reputation: 862406

Use Series.unstack with simplify groupby solution - removed agg and added GroupBy.size.

GroupBy.count is used for count with exclude missing values, here both solution working same, because age_range is used for by parameter in groupby:

df = data_frame.groupby(['Region','age_range']).size().unstack(fill_value=0)

Or use crosstab:

df = pd.crosstab(data_frame['Region'], data_frame['age_range'])

Upvotes: 2

ipj
ipj

Reputation: 3598

Try DataFrame.pivot method:

data_frame.pivot(index='Region', columns='age_range', values='count')

Upvotes: 0

Related Questions