Subset dataframe on a column with type = category

Question

I'm puzzled as to why the below code doesn't work. I created the column AgeBands with the pd.cut function, so the type is category. In theory, I should be able to subset on it like I would on a string column, but when I try that the resulting dataframe new_df has zero rows. What am I missing?

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape

When running df.info() I have confirmation the AgeBands is indeed of type category:


RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age         6 non-null int64
AgeBands    6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes

Tbaki · Accepted Answer

You misspell what you had in the df, it's '(30, 40]', not '(30-40]'

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30, 40]']
new_df

ouput

    Age AgeBands
1   38  (30, 40]
3   35  (30, 40]
4   35  (30, 40]

Subset dataframe on a column with type = category

Answers (2)

Related Questions