Vivek
Vivek

Reputation: 11

Binning of categorical variables

I am trying to BIN the categorical Variables in R but I am unable to cluster the information given into a useful group.

For example: take the below variable Grade which contains below mentioned unique values.

Grade <- OM1 OM2 PC1 SC1 SC3 AM1 AM3 PL2 SC2 UH1 SS2 PM3 

The above mentioned are the different Grades in a company which are assigned to the employees. I want the information to be grouped into meaningful groups like:

GROUP 1 - Low grades - should contain grades of low priority given to trainees like OM1, OM2 and PC1

GROUP2 - Medium grades should contain grades of medium priority given to employees having 3-4 yrs of experience like SC3, AM1, AM3 and PL2

GROUP3 - High grades should contain grades of high priorities given to VPS and Delivery managers like SC3, AM1, AM3 and PL2.

Any help would be deeply appreciated. Thanks in advance.

Upvotes: 1

Views: 4658

Answers (2)

Michael Lugo
Michael Lugo

Reputation: 377

I'd do this with a merge (in base R) or a join (in dplyr) between the data you already have an I assume that you already have a data frame dat that has a field Grade. Then you can do the following. (The call to tribble is just one of many ways to create a data frame that shows the grade bins.)

library(dplyr)
grade_bins = tribble(
    ~Grade, ~bin,
    'OM1', 'low',
    'OM2', 'low',
    'PC1', 'low',
    'SC1', 'med', 
    'SC3', 'med',  
    'AM1', 'med', 
    'AM3', 'med', 
    'PL2', 'med',
    'SC2', 'high',
    'UH1', 'high',
    'SS2', 'high',
    'PM3', 'high')
dat_with_grades = left_join(dat, grade_levels, by = 'Grade')

I do a left_join because in my experience these sorts of data set end up having values of the variable you're joining on (in this case, employee grades) that you don't know exist. In this casedat_with_grades will just have NA for those employees' grades, as opposed to silently dropping them.

Upvotes: 0

phiver
phiver

Reputation: 23608

I'm going to assume that group 3 will the grades not specified in groups 1 and 2.

Grade <- c("OM1", "OM2", "PC1", "SC1", "SC3", "AM1", "AM3", "PL2", "SC2", "UH1", "SS2", "PM3") 


base R:
ifelse(Grade %in% c("OM1", "OM2", "PC1"), "Low grades",
       ifelse(Grade %in% c("SC1", "SC3", "AM1", "AM3", "PL2"), "Medium grades", "High grades"))

dplyr:
case_when(Grade %in% c("OM1", "OM2", "PC1") ~ "Low grades",
          Grade %in% c("SC1", "SC3", "AM1", "AM3", "PL2") ~ "Medium grades",
          TRUE ~ "High grades")

Upvotes: 1

Related Questions