sutsabs
sutsabs

Reputation: 415

R update the order of strings separated by comma within a column based on a list

I have a dataframe that has a column named line_name. Line_name has strings separated by commas. (ex "A, B, C")

If any string from pdl1_mono list is in the line_name, I want that word to appear at the first. Right now the list of words within Line_name is ordered alphabetically. Basically, I am trying to update the order of the strings within Line_name column.

For example if Line_name = "carboplatin,paclitaxel,pembrolizumab", since "pembrolizumab" is in the pdl1_mono list I want to update it to "pembrolizumab,carboplatin,paclitaxel".

pdl1_mono <- c('atezolizumab', 'cemiplimab', 'durvalumab', 'nivolumab', 'nivolumab,ipilimumab',
               'pembrolizumab')

df <- df %>% 
        mutate(
            line_name = sub('^(.*),pembrolizumab', 'pembrolizumab,\\1', line_name),
            line_name = sub('^(.*),nivolumab', 'nivolumab,\\1', line_name)
        )

I able to update using sub() function for each item from pdl1_mono list. Can we do use some dynamic function to update the Line_name column instead of manually repeating for each item? I tried using map but I think this is not right. Kept getting error

df1 <- df %>% 
     mutate(
         line_name = 
             case_when(str_detect(line_name, paste(pdl1_mono, collapse = "|")) ~ map(pdl1_mono, 
                                                                                     sub(paste0("^(.*),",.,"'"), paste0(.,",\\1"), line_name)),
                       TRUE ~.                          ))

Here is sample dataset for test

structure(list(patient_id = c("FB686A25501E9", "F8B1ED05646B2", 
"FC3D22E4CC73C", "F26230CDC7E74", "F22AF8C94E657", "FAF785C1A151A", 
"F6E1B8F6F0EA9", "F428FFA3E8E61", "F6B57AFAF6B24", "FA7560BD1D1AD", 
"FAAC879CAF38F", "F1C824D17FCB9", "F25182C921986", "F890B3306D38F", 
"FF26E35E93510", "FB4FB81ACD59D", "FA32928EF3D5B", "FA7D9EBBD7483", 
"FF3F362DE9D91", "F0BA038C8DD49"), line_name = c("bevacizumab,carboplatin,pemetrexed", 
"dabrafenib,trametinib", "gemcitabine,paclitaxel", "nivolumab", 
"carboplatin,paclitaxel protein-bound", "cisplatin,etoposide", 
"paclitaxel,pertuzumab,trastuzumab", "carboplatin,paclitaxel protein-bound,pembrolizumab", 
"carboplatin,gemcitabine", "afatinib,carboplatin,pembrolizumab,pemetrexed", 
"paclitaxel protein-bound", "bevacizumab,carboplatin,pemetrexed", 
"carboplatin,docetaxel", "cisplatin,etoposide", "carboplatin,paclitaxel protein-bound", 
"carboplatin,pemetrexed", "nivolumab", "afatinib", "carboplatin,paclitaxel", 
"bevacizumab,carboplatin,pemetrexed")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 59

Answers (1)

coffeinjunky
coffeinjunky

Reputation: 11514

Please test this:

library(tidyverse)

re_do <- function(string, group){
  vec <- unlist(strsplit(string, ','))
  i <- which(vec %in% group)
  
  if(length(i)>0){
    new_vec <- paste0(c(vec[i], vec[-i]), collapse=',')  
    return(new_vec)
  }
  else{
    return(string)
  }

}


df %>% 
  rowwise() %>% 
  mutate(new_vec = re_do(line_name, pdl1_mono))

# A tibble: 20 x 3
# Rowwise: 
   patient_id    line_name                                          new_vec                                           
   <chr>         <chr>                                              <chr>                                             
 1 FB686A25501E9 bevacizumab,carboplatin,pemetrexed                 bevacizumab,carboplatin,pemetrexed                
 2 F8B1ED05646B2 dabrafenib,trametinib                              dabrafenib,trametinib                             
 3 FC3D22E4CC73C gemcitabine,paclitaxel                             gemcitabine,paclitaxel                            
 4 F26230CDC7E74 nivolumab                                          nivolumab                                         
 5 F22AF8C94E657 carboplatin,paclitaxel protein-bound               carboplatin,paclitaxel protein-bound              
 6 FAF785C1A151A cisplatin,etoposide                                cisplatin,etoposide                               
 7 F6E1B8F6F0EA9 paclitaxel,pertuzumab,trastuzumab                  paclitaxel,pertuzumab,trastuzumab                 
 8 F428FFA3E8E61 carboplatin,paclitaxel protein-bound,pembrolizumab pembrolizumab,carboplatin,paclitaxel protein-bound
 9 F6B57AFAF6B24 carboplatin,gemcitabine                            carboplatin,gemcitabine                           
10 FA7560BD1D1AD afatinib,carboplatin,pembrolizumab,pemetrexed      pembrolizumab,afatinib,carboplatin,pemetrexed     
11 FAAC879CAF38F paclitaxel protein-bound                           paclitaxel protein-bound                          
12 F1C824D17FCB9 bevacizumab,carboplatin,pemetrexed                 bevacizumab,carboplatin,pemetrexed                
13 F25182C921986 carboplatin,docetaxel                              carboplatin,docetaxel                             
14 F890B3306D38F cisplatin,etoposide                                cisplatin,etoposide                               
15 FF26E35E93510 carboplatin,paclitaxel protein-bound               carboplatin,paclitaxel protein-bound              
16 FB4FB81ACD59D carboplatin,pemetrexed                             carboplatin,pemetrexed                            
17 FA32928EF3D5B nivolumab                                          nivolumab                                         
18 FA7D9EBBD7483 afatinib                                           afatinib                                          
19 FF3F362DE9D91 carboplatin,paclitaxel                             carboplatin,paclitaxel                            
20 F0BA038C8DD49 bevacizumab,carboplatin,pemetrexed                 bevacizumab,carboplatin,pemetrexed                

Upvotes: 1

Related Questions