jenswirf
jenswirf

Reputation: 7317

How to extract a partially matched string into a new column?

I have this data.table with strings:

dt = tbl_dt(data.table(x=c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen")))

                   x
1    book|ball|apple
2  flower|orange|cup
3 banana|bandana|pen

..and I also have a reference string which I would like to match with the one in the data.table, extracting the word if it's in there, like so..

fruits = "apple|banana|orange"

str_match(fruits, "flower|orange|cup")
>"orange"

How do I do this for the entire data.table?

require(dplyr)
require(stringr)

dt %>%
   mutate (fruit = str_match(fruits, x))

Error in rep(NA_character_, n) : invalid 'times' argument
In addition: Warning message:
In regexec(c("book|ball|apple", "flower|orange|cup", "banana|bandana|pen" :
argument 'pattern' has length > 1 and only the first element will be used

What I would like:

                   x       fruit
1    book|ball|apple       apple
2  flower|orange|cup      orange
3 banana|bandana|pen      banana

Upvotes: 1

Views: 476

Answers (3)

rnso
rnso

Reputation: 24555

A solution using base R and without str_match:

fruit=NULL
reflist = unlist(strsplit(fruits, '\\|'))
for(xx in ddf$x){
    ss = unlist(strsplit(xx,'\\|'))
    for(s in ss) if(s %in% reflist) fruit[length(fruit)+1]=s
}
ddf$fruit = fruit
ddf
#                   x  fruit
#1    book|ball|apple  apple
#2  flower|orange|cup orange
#3 banana|bandana|pen banana

Upvotes: 0

David Arenburg
David Arenburg

Reputation: 92292

Or (in order to avoid warnings, it is better though that instead of tbl_dt you will use data.table)

dt[, fruits := mapply(str_match, fruits, x)]
dt
##                     x fruits
## 1:    book|ball|apple  apple
## 2:  flower|orange|cup orange
## 3: banana|bandana|pen banana

Or you could do something similar to @akrun's answer, such as

dt[, fruits := lapply(x, str_match, fruits)]

Upvotes: 2

akrun
akrun

Reputation: 887173

 dt$fruit <- unlist(lapply(dt$x, str_match, fruits))

 dt
 #Source: local data table [3 x 2]
 #
 #                  x  fruit
#1    book|ball|apple  apple
#2  flower|orange|cup orange
#3 banana|bandana|pen banana

Upvotes: 1

Related Questions