Reputation: 7317
I have this data.table with strings:
dt = tbl_dt(data.table(x=c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen")))
x
1 book|ball|apple
2 flower|orange|cup
3 banana|bandana|pen
..and I also have a reference string which I would like to match with the one in the data.table, extracting the word if it's in there, like so..
fruits = "apple|banana|orange"
str_match(fruits, "flower|orange|cup")
>"orange"
How do I do this for the entire data.table?
require(dplyr)
require(stringr)
dt %>%
mutate (fruit = str_match(fruits, x))
Error in rep(NA_character_, n) : invalid 'times' argument
In addition: Warning message:
In regexec(c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen" :
argument 'pattern' has length > 1 and only the first element will be used
What I would like:
x fruit
1 book|ball|apple apple
2 flower|orange|cup orange
3 banana|bandana|pen banana
Upvotes: 1
Views: 476
Reputation: 24555
A solution using base R and without str_match:
fruit=NULL
reflist = unlist(strsplit(fruits, '\\|'))
for(xx in ddf$x){
ss = unlist(strsplit(xx,'\\|'))
for(s in ss) if(s %in% reflist) fruit[length(fruit)+1]=s
}
ddf$fruit = fruit
ddf
# x fruit
#1 book|ball|apple apple
#2 flower|orange|cup orange
#3 banana|bandana|pen banana
Upvotes: 0
Reputation: 92292
Or (in order to avoid warnings, it is better though that instead of tbl_dt
you will use data.table
)
dt[, fruits := mapply(str_match, fruits, x)]
dt
## x fruits
## 1: book|ball|apple apple
## 2: flower|orange|cup orange
## 3: banana|bandana|pen banana
Or you could do something similar to @akrun's answer, such as
dt[, fruits := lapply(x, str_match, fruits)]
Upvotes: 2
Reputation: 887173
dt$fruit <- unlist(lapply(dt$x, str_match, fruits))
dt
#Source: local data table [3 x 2]
#
# x fruit
#1 book|ball|apple apple
#2 flower|orange|cup orange
#3 banana|bandana|pen banana
Upvotes: 1