Reputation: 1112
I have one data frame similar to:
df1 = read.table(text="name element
m20895 P540554
m20897 S01367
m20907 P540554,P209332
m20914 S01367,S01247
m20915 S01247
m20925 S00897
m26943 P540554,P209332
m26944 S00897,S00898,S00899,S00900", header=T, stringsAsFactors=F)
I would like to replace any element(s) starting with "S" by the corresponding value in the "B" column of data frame (df2)
df2 = read.table(text="A B
S01367 P00432
S01247 P00433
S00897 P00434
S00898 P00435
S00899 P00436
S00900 P00437
S00901 P00438", header=T, stringsAsFactors=F)
so the result expected:
result = read.table(text="name element
m20895 P540554
m20897 P00432
m20907 P540554,P209332
m20914 P00432,P00433
m20915 P00433
m20925 P00434
m26943 P540554,P209332
m26944 P00434,P00435,P00436,P00437", header=T, stringsAsFactors=F)
Upvotes: 0
Views: 160
Reputation: 24074
You first need to split your element
column by ,
:
element_split <- strsplit(df1$element, ",")
You obtain a list and, for each element of this list, you can look for item beginning with S
(grep("^S", ...)
) and replace them with the corresponding B
value in your df2
(if there is none, then you just return the item as is):
l_element_B <- lapply(element_split,
function(x) if(any(y <- grep("^S", x))) {x[y] <- df2$B[match(x[y], df2$A)]; x} else x)
Then you can collapse your elements with ,
to get your new column (or keep it as list to make it easier to work with):
df1$new_element <- sapply(l_element_B, paste, collapse=",")
df1
# name element new_element
#1 m20895 P540554 P540554
#2 m20897 S01367 P00432
#3 m20907 P540554,P209332 P540554,P209332
#4 m20914 S01367,S01247 P00432,P00433
#5 m20915 S01247 P00433
#6 m20925 S00897 P00434
#7 m26943 P540554,P209332 P540554,P209332
#8 m26944 S00897,S00898,S00899,S00900 P00434,P00435,P00436,P00437
N.B:
Of course you can directly modify your column by doing
df1$element <- sapply(l_element_B, paste, collapse=",")
instead of
df1$new_element <- sapply(l_element_B, paste, collapse=",")
Upvotes: 3
Reputation: 17648
You can try this as well
library(tidyverse)
df1 %>%
separate_rows(element) %>%
left_join(df2, by=c("element" = "A")) %>%
mutate(element = ifelse(is.na(B), element, B)) %>%
group_by(name) %>%
summarise(element= paste(element, collapse = ","))
# A tibble: 8 x 2
name element
<chr> <chr>
1 m20895 P540554
2 m20897 P00432
3 m20907 P540554,P209332
4 m20914 P00432,P00433
5 m20915 P00433
6 m20925 P00434
7 m26943 P540554,P209332
8 m26944 P00434,P00435,P00436,P00437
Upvotes: 2