Reputation: 183
I am working on binning categorical variables. The column I am working with is:
Adult.loc[:,"education"].value_counts()
HS-grad 10501
Some-college 7291
Bachelors 5355
Masters 1723
Assoc-voc 1382
11th 1175
Assoc-acdm 1067
10th 933
7th-8th 646
Prof-school 576
9th 514
12th 433
Doctorate 413
5th-6th 333
1st-4th 168
Preschool 51
I am trying to bin these variables into 3 columns: No-highschool, high school, and college. I have ran the code:
Adult.loc[Adult.loc[:,"education"] == "Preschool", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "1st-4th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "5th-6th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "7th-8th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "Prof-school", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "9th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "10th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "11th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "12th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "HS-grad", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "Some-college", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Bachelors", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Masters", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Assoc-voc", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Assoc-acdm", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Doctorate", "education"]="College"
Is there a way to write a function that will bin these categorical variables? The route I have taken seems it would not be the best route given a dataset with many different variables.
Upvotes: 1
Views: 657
Reputation: 2624
Refer to the following example code.
df = pd.DataFrame({"x":['a', 'b', 'c', 'a', 'b']})
value_dict = {'a':'A', 'b':'A', 'c':'B'}
df['x'] = df['x'].replace(value_dict)
You just need to define your value_dict (e.g., {"Preschool":"No Highschool", ... "Doctorate":"College"}
Upvotes: 2