Sebastian Zeki
Sebastian Zeki

Reputation: 6874

Multiple case_when for text in one row

Data

I have a dataframe as follows:

structure(list(EndoscopyEventRaw = c("", "", "oesophagus:rfa;oesophagus:nac", 
"oesophagus:rfa;oesophagus:nac", "oesophagus:brushings", "oesophagus:rfa;oesophagus:emr;oesophagus:nac", 
"oesophagus:apc", "oesophagus:apc;oesophagus:nac", "oesophagus:apc", 
"")), row.names = c(NA, 10L), class = "data.frame")

Aim

I'd like to extract aspects of this into a new column, possibly using case_whenbased on rules as follows:

 dataframe<-dataframe %>%     mutate(OPCS4ZCode2 = case_when( 
      grepl("nac",EndoscopyEventRaw)~  "CodeForNAC",
      grepl("apc",EndoscopyEventRaw) ~  "CodeForAPC",
      grepl("rfa",EndoscopyEventRaw) ~  "CodeForRFA",
      grepl("grasp",EndoscopyEventRaw) ~  "CodeForGrasp"
    ),
    TRUE ~ ""
  )

Problem and desired result

However, some of the rows have more than one element that code be codified into the new column so that the end result should be:

1
2
3 CodeForRFA,CodeForNAC
4 CodeForRFA,CodeForNAC
5 
6 CodeForRFA,CodeForNAC
7 CodeForAPC
8 CodeForAPC,CodeForNAC
9 CodeForAPC
10

When I use case_when it stops looking when the first match is found. Is there a way of matching all the targets as above with or without using case_when?

Upvotes: 0

Views: 47

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

Yes, case_when skips other conditions once a match is found. One way is to separate the data into different rows and then use the conditions with case_when and summarise the data back.

library(dplyr)

df %>%
  mutate(row = row_number()) %>%
  tidyr::separate_rows(EndoscopyEventRaw, sep = ";") %>%
  mutate(OPCS4ZCode2 = case_when(grepl("nac",EndoscopyEventRaw)~  "CodeForNAC",
                     grepl("apc",EndoscopyEventRaw) ~  "CodeForAPC",
                     grepl("rfa",EndoscopyEventRaw) ~  "CodeForRFA",
                     grepl("grasp",EndoscopyEventRaw) ~  "CodeForGrasp",
                     TRUE ~ "")) %>%
   group_by(row) %>%
   summarise(OPCS4ZCode2 = toString(OPCS4ZCode2)) %>%
   select(-row)

# A tibble: 10 x 1
#   OPCS4ZCode2             
#   <chr>                   
# 1 ""                      
# 2 ""                      
# 3 CodeForRFA, CodeForNAC  
# 4 CodeForRFA, CodeForNAC  
# 5 ""                      
# 6 CodeForRFA, , CodeForNAC
# 7 CodeForAPC              
# 8 CodeForAPC, CodeForNAC  
# 9 CodeForAPC              
#10 ""                      

Upvotes: 2

Related Questions