replace elements with a pattern in a dataframe in r

Question

I have one data frame similar to:

df1 = read.table(text="name element
    m20895  P540554
    m20897  S01367
    m20907  P540554,P209332
    m20914  S01367,S01247
    m20915  S01247
    m20925  S00897
    m26943  P540554,P209332
    m26944  S00897,S00898,S00899,S00900", header=T, stringsAsFactors=F)

I would like to replace any element(s) starting with "S" by the corresponding value in the "B" column of data frame (df2)

    df2 = read.table(text="A    B
    S01367  P00432
    S01247  P00433
    S00897  P00434
    S00898  P00435
    S00899  P00436
    S00900  P00437
    S00901  P00438", header=T, stringsAsFactors=F)

so the result expected:

    result = read.table(text="name  element
    m20895  P540554
    m20897  P00432
    m20907  P540554,P209332
    m20914  P00432,P00433
    m20915  P00433
    m20925  P00434
    m26943  P540554,P209332
    m26944  P00434,P00435,P00436,P00437", header=T, stringsAsFactors=F)

Cath · Accepted Answer

You first need to split your element column by ,:

element_split <- strsplit(df1$element, ",")

You obtain a list and, for each element of this list, you can look for item beginning with S (grep("^S", ...)) and replace them with the corresponding B value in your df2 (if there is none, then you just return the item as is):

l_element_B <- lapply(element_split, 
                      function(x) if(any(y <- grep("^S", x))) {x[y] <- df2$B[match(x[y], df2$A)]; x} else x)

Then you can collapse your elements with , to get your new column (or keep it as list to make it easier to work with):

df1$new_element <- sapply(l_element_B, paste, collapse=",")

df1
#    name                     element                 new_element
#1 m20895                     P540554                     P540554
#2 m20897                      S01367                      P00432
#3 m20907             P540554,P209332             P540554,P209332
#4 m20914               S01367,S01247               P00432,P00433
#5 m20915                      S01247                      P00433
#6 m20925                      S00897                      P00434
#7 m26943             P540554,P209332             P540554,P209332
#8 m26944 S00897,S00898,S00899,S00900 P00434,P00435,P00436,P00437

N.B:

Of course you can directly modify your column by doing

df1$element <- sapply(l_element_B, paste, collapse=",")

instead of

df1$new_element <- sapply(l_element_B, paste, collapse=",")

replace elements with a pattern in a dataframe in r

Answers (2)

Related Questions