bogdanCsn
bogdanCsn

Reputation: 1325

Subset dataframe on a column with type = category

I'm puzzled as to why the below code doesn't work. I created the column AgeBands with the pd.cut function, so the type is category. In theory, I should be able to subset on it like I would on a string column, but when I try that the resulting dataframe new_df has zero rows. What am I missing?

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape

When running df.info() I have confirmation the AgeBands is indeed of type category:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age         6 non-null int64
AgeBands    6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes

Upvotes: 2

Views: 1824

Answers (2)

ammy
ammy

Reputation: 648

for better understand you can set label for bucket range.

df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])], labels=range(1,7))

output:

  Age AgeBands
0   22        3
1   38        4
2   26        3
3   35        4
4   35        4
5   65        6

find df[df['AgeBands'] == 3]

 Age AgeBands
0   22        3
2   26        3

Upvotes: 1

Tbaki
Tbaki

Reputation: 1003

You misspell what you had in the df, it's '(30, 40]', not '(30-40]'

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30, 40]']
new_df

ouput

    Age AgeBands
1   38  (30, 40]
3   35  (30, 40]
4   35  (30, 40]

Upvotes: 2

Related Questions