Reputation: 167
I have a data frame with an 'education' attribute. Values are discrete, 1-16. For purposes of cross-tabulation, I want to bin this 'education' variable but with custom bins (1:8, 9:11, 12, 13:15, 16).
I've been fooling around with pd.cut() but I get an invalid syntax error
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])
Upvotes: 0
Views: 232
Reputation: 150745
Try making the bins fall between the thresholds:
bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School',
'High School Grad', 'Some College', 'College Grad']
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
bins=bins,
labels=labels)
Test:
adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})
Output:
education education_bins
0 1 Middle School or less
1 2 Middle School or less
2 3 Middle School or less
3 4 Middle School or less
4 5 Middle School or less
5 6 Middle School or less
6 7 Middle School or less
7 8 Middle School or less
8 9 Some High School
9 10 Some High School
10 11 Some High School
11 12 High School Grad
12 13 Some College
13 14 Some College
14 15 Some College
15 16 College Grad
Upvotes: 1