Reputation: 11
I am trying to BIN the categorical Variables in R
but I am unable to cluster the information given into a useful group.
For example: take the below variable Grade
which contains below mentioned unique values.
Grade <- OM1 OM2 PC1 SC1 SC3 AM1 AM3 PL2 SC2 UH1 SS2 PM3
The above mentioned are the different Grades in a company which are assigned to the employees. I want the information to be grouped into meaningful groups like:
GROUP
1 - Low grades
- should contain grades of low priority given to trainees like OM1
, OM2
and PC1
GROUP2
- Medium grades
should contain grades of medium priority given to employees having 3-4 yrs of experience like SC3
, AM1
, AM3
and PL2
GROUP3
- High grades
should contain grades of high priorities given to VPS and Delivery managers like SC3
, AM1
, AM3
and PL2
.
Any help would be deeply appreciated. Thanks in advance.
Upvotes: 1
Views: 4658
Reputation: 377
I'd do this with a merge (in base R) or a join (in dplyr) between the data you already have an I assume that you already have a data frame dat
that has a field Grade
. Then you can do the following. (The call to tribble
is just one of many ways to create a data frame that shows the grade bins.)
library(dplyr)
grade_bins = tribble(
~Grade, ~bin,
'OM1', 'low',
'OM2', 'low',
'PC1', 'low',
'SC1', 'med',
'SC3', 'med',
'AM1', 'med',
'AM3', 'med',
'PL2', 'med',
'SC2', 'high',
'UH1', 'high',
'SS2', 'high',
'PM3', 'high')
dat_with_grades = left_join(dat, grade_levels, by = 'Grade')
I do a left_join
because in my experience these sorts of data set end up having values of the variable you're joining on (in this case, employee grades) that you don't know exist. In this casedat_with_grades
will just have NA
for those employees' grades, as opposed to silently dropping them.
Upvotes: 0
Reputation: 23608
I'm going to assume that group 3 will the grades not specified in groups 1 and 2.
Grade <- c("OM1", "OM2", "PC1", "SC1", "SC3", "AM1", "AM3", "PL2", "SC2", "UH1", "SS2", "PM3")
base R:
ifelse(Grade %in% c("OM1", "OM2", "PC1"), "Low grades",
ifelse(Grade %in% c("SC1", "SC3", "AM1", "AM3", "PL2"), "Medium grades", "High grades"))
dplyr:
case_when(Grade %in% c("OM1", "OM2", "PC1") ~ "Low grades",
Grade %in% c("SC1", "SC3", "AM1", "AM3", "PL2") ~ "Medium grades",
TRUE ~ "High grades")
Upvotes: 1