Reputation: 135
I have a column with 692 degree titles that I need to categorize as: certificate, associate, bachelor, master or higher. There is a lot of inconsistency in a degree titles. For example, a BS degree might include BS, B S, BS in Nursing, BSE, B.S. Accounting, Bachelor of Science, Bachelor of Science in Genetics, and so on. Each one need to be categorized as "Bachelor".
I've tried to detect as much of the string as possible using str_detect, but have not been very successful. How would I detect these different types of degrees?
What I have | What I need |
---|---|
Bachelor of Science | Bachelor |
BA | Bachelor |
BFA | Bachelor |
Certificate in Nursing | Certificate |
Associates in Art | Associate |
AA | Associate |
MS | Master or higher |
Masters of Art | Master or higher |
Upvotes: 1
Views: 50
Reputation: 7116
Maybe something like this?
library(tidyverse)
df <-
tibble::tribble(
~What.I.have, ~What.I.need,
"Bachelor of Science", "Bachelor",
"BA", "Bachelor",
"BFA", "Bachelor",
"Certificate in Nursing", "Certificate",
"Associates in Art", "Associate",
"AA", "Associate",
"MS", "Master or higher",
"Masters of Art", "Master or higher"
)
df %>% mutate(new = case_when(str_detect(What.I.have, 'Bachelor|BA|BFA') ~ 'Bachelor',
str_detect(What.I.have, 'Certificate') ~ 'Certificate',
str_detect(What.I.have, 'Associates|AA') ~ 'Associate',
str_detect(What.I.have, 'Masters|MS') ~ 'Master or higher'))
#> # A tibble: 8 × 3
#> What.I.have What.I.need new
#> <chr> <chr> <chr>
#> 1 Bachelor of Science Bachelor Bachelor
#> 2 BA Bachelor Bachelor
#> 3 BFA Bachelor Bachelor
#> 4 Certificate in Nursing Certificate Certificate
#> 5 Associates in Art Associate Associate
#> 6 AA Associate Associate
#> 7 MS Master or higher Master or higher
#> 8 Masters of Art Master or higher Master or higher
Created on 2022-01-04 by the reprex package (v2.0.1)
What @Onyambu suggested in the comments will also yield the correct results for this data.
library(tidyverse)
df %>%
transmute(new = case_when(
str_detect(What.I.have, "^B") ~ "Bachelor",
str_detect(What.I.have, "^C") ~ "Certificate",
str_detect(What.I.have, "^A") ~ "Associate",
str_detect(What.I.have, "^M") ~ "Master or higher"
))
#> # A tibble: 8 × 1
#> new
#> <chr>
#> 1 Bachelor
#> 2 Bachelor
#> 3 Bachelor
#> 4 Certificate
#> 5 Associate
#> 6 Associate
#> 7 Master or higher
#> 8 Master or higher
Created on 2022-01-04 by the reprex package (v2.0.1)
Upvotes: 1