Greg Sullivan
Greg Sullivan

Reputation: 183

how to create a function that will bin categorical variables

I am working on binning categorical variables. The column I am working with is:

Adult.loc[:,"education"].value_counts()
HS-grad         10501
Some-college     7291
Bachelors        5355
Masters          1723
Assoc-voc        1382
11th             1175
Assoc-acdm       1067
10th              933
7th-8th           646
Prof-school       576
9th               514
12th              433
Doctorate         413
5th-6th           333
1st-4th           168
Preschool          51

I am trying to bin these variables into 3 columns: No-highschool, high school, and college. I have ran the code:

Adult.loc[Adult.loc[:,"education"] == "Preschool", "education"]="No Highschool" 
Adult.loc[Adult.loc[:,"education"] == "1st-4th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "5th-6th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "7th-8th", "education"]="No Highschool"
Adult.loc[Adult.loc[:,"education"] == "Prof-school", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "9th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "10th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "11th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "12th", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "HS-grad", "education"]="Highschool"
Adult.loc[Adult.loc[:,"education"] == "Some-college", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Bachelors", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Masters", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Assoc-voc", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Assoc-acdm", "education"]="College"
Adult.loc[Adult.loc[:,"education"] == "Doctorate", "education"]="College"

Is there a way to write a function that will bin these categorical variables? The route I have taken seems it would not be the best route given a dataset with many different variables.

Upvotes: 1

Views: 657

Answers (1)

Gilseung Ahn
Gilseung Ahn

Reputation: 2624

Refer to the following example code.

df = pd.DataFrame({"x":['a', 'b', 'c', 'a', 'b']})

value_dict = {'a':'A', 'b':'A', 'c':'B'}
df['x'] = df['x'].replace(value_dict)

You just need to define your value_dict (e.g., {"Preschool":"No Highschool", ... "Doctorate":"College"}

Upvotes: 2

Related Questions