Juliana B C
Juliana B C

Reputation: 63

How can I use mutate case when with a list of words

I have a script like this:

path_ko_rn_eq_car_2 = path_ko_rn_eq_car %>% mutate(new_var =case_when(grepl("C00469", equation) ~ "C00469",
                                                                      grepl("C00009", equation) ~ "C00009",
                                                                      grepl("C00409", equation) ~ "C00409",
                                                                      grepl("C01009", equation) ~ "C001009",
                                                                      grepl("C03409", equation) ~ "C03409",
                                                                      grepl("C09091", equation) ~ "C09091",
                                                                      grepl("C00001", equation) ~ "C00001",
                                                                      grepl("C00001", equation) ~ "C00001",
                                                                      grepl("C01001", equation) ~ "C001001",
                                                                      grepl("C03402", equation) ~ "C03402",
                                                                      grepl("C09095", equation) ~ "C09095",
                                                                      grepl("C00006", equation) ~ "C00006",
                                                                      grepl("C00005", equation) ~ "C00005"))

Is it possible to make the script smaller?, I tried this:

cc = c("C00469","C00084","C00001","C03409","C03402")
my_function= function(x){path_ko_rn_eq_car_2 = path_ko_rn_eq_car %>% mutate(new_var =case_when(grepl(x, equation) ~ x))}


data.frame(sapply(cc, myfunction))

Upvotes: 0

Views: 59

Answers (1)

Ricardo Semião
Ricardo Semião

Reputation: 4456

You can extract the pattern "'C' followed by 5 numbers" with gsub:

mutate(path_ko_rn_eq_car, new_var = gsub(".+(C[0-9]{5}).+", "\\1", equation))

REGEX explanation:

  • The '.+' part matches anything ('.' = any character, '+' = once or more appearances), except the part inside the '()'.
  • The '()' define a group, so that we can ask for gsub to return only that group later.
  • 'C[0-9]{5}': 'C' is the actual letter, [0-9] is "any character between 0 and 9", and '{5}' means exactly 5 appearances.
  • '\1' means "return the first group", i.e. everything inside the '()'.

Upvotes: 1

Related Questions