user3354212
user3354212

Reputation: 1112

replace elements with a pattern in a dataframe in r

I have one data frame similar to:

df1 = read.table(text="name element
    m20895  P540554
    m20897  S01367
    m20907  P540554,P209332
    m20914  S01367,S01247
    m20915  S01247
    m20925  S00897
    m26943  P540554,P209332
    m26944  S00897,S00898,S00899,S00900", header=T, stringsAsFactors=F)

I would like to replace any element(s) starting with "S" by the corresponding value in the "B" column of data frame (df2)

    df2 = read.table(text="A    B
    S01367  P00432
    S01247  P00433
    S00897  P00434
    S00898  P00435
    S00899  P00436
    S00900  P00437
    S00901  P00438", header=T, stringsAsFactors=F)

so the result expected:

    result = read.table(text="name  element
    m20895  P540554
    m20897  P00432
    m20907  P540554,P209332
    m20914  P00432,P00433
    m20915  P00433
    m20925  P00434
    m26943  P540554,P209332
    m26944  P00434,P00435,P00436,P00437", header=T, stringsAsFactors=F)

Upvotes: 0

Views: 160

Answers (2)

Cath
Cath

Reputation: 24074

You first need to split your element column by ,:

element_split <- strsplit(df1$element, ",")

You obtain a list and, for each element of this list, you can look for item beginning with S (grep("^S", ...)) and replace them with the corresponding B value in your df2 (if there is none, then you just return the item as is):

l_element_B <- lapply(element_split, 
                      function(x) if(any(y <- grep("^S", x))) {x[y] <- df2$B[match(x[y], df2$A)]; x} else x)

Then you can collapse your elements with , to get your new column (or keep it as list to make it easier to work with):

df1$new_element <- sapply(l_element_B, paste, collapse=",")

df1
#    name                     element                 new_element
#1 m20895                     P540554                     P540554
#2 m20897                      S01367                      P00432
#3 m20907             P540554,P209332             P540554,P209332
#4 m20914               S01367,S01247               P00432,P00433
#5 m20915                      S01247                      P00433
#6 m20925                      S00897                      P00434
#7 m26943             P540554,P209332             P540554,P209332
#8 m26944 S00897,S00898,S00899,S00900 P00434,P00435,P00436,P00437

N.B:

Of course you can directly modify your column by doing

df1$element <- sapply(l_element_B, paste, collapse=",")

instead of

df1$new_element <- sapply(l_element_B, paste, collapse=",")

Upvotes: 3

Roman
Roman

Reputation: 17648

You can try this as well

library(tidyverse)
df1 %>% 
  separate_rows(element) %>% 
  left_join(df2, by=c("element" = "A")) %>% 
  mutate(element = ifelse(is.na(B), element, B)) %>% 
  group_by(name) %>% 
  summarise(element= paste(element, collapse = ","))  
# A tibble: 8 x 2
  name   element                    
  <chr>  <chr>                      
1 m20895 P540554                    
2 m20897 P00432                     
3 m20907 P540554,P209332            
4 m20914 P00432,P00433              
5 m20915 P00433                     
6 m20925 P00434                     
7 m26943 P540554,P209332            
8 m26944 P00434,P00435,P00436,P00437

Upvotes: 2

Related Questions