Reputation: 1325
I'm puzzled as to why the below code doesn't work. I created the column AgeBands
with the pd.cut
function, so the type is category. In theory, I should be able to subset on it like I would on a string column, but when I try that the resulting dataframe new_df
has zero rows. What am I missing?
import numpy as np
import pandas as pd
df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])
new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape
When running df.info()
I have confirmation the AgeBands
is indeed of type category:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age 6 non-null int64
AgeBands 6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes
Upvotes: 2
Views: 1824
Reputation: 648
for better understand you can set label for bucket range.
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])], labels=range(1,7))
output:
Age AgeBands
0 22 3
1 38 4
2 26 3
3 35 4
4 35 4
5 65 6
find df[df['AgeBands'] == 3]
Age AgeBands
0 22 3
2 26 3
Upvotes: 1
Reputation: 1003
You misspell what you had in the df, it's '(30, 40]'
, not '(30-40]'
import numpy as np
import pandas as pd
df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])
new_df = df[df['AgeBands'] == '(30, 40]']
new_df
ouput
Age AgeBands
1 38 (30, 40]
3 35 (30, 40]
4 35 (30, 40]
Upvotes: 2