Reputation: 21
I have a data frame with a column called listA, and a listB. I want to pull out only those rows in the data frame which match to an entry in listB, so I have:
newData <- mydata[mydata$listA %in% listB,]
However, some entries of listA are in the format "ABC /// DEF", where both ABC and DEF are possible entries in listB. I want to pull out the rows of the data frame which have a listA for which any of the words match to an entry in listB. So if listB had "ABC" in it, that entry would be included in newData. I found the strsplit function, but things like
strsplit(mydata$listA," ") %in% listB
always returns FALSE, presumably because it's checking if the whole list returned by strsplit is an entry in listB.
Upvotes: 2
Views: 2213
Reputation: 33940
match(word_vector, target_vector)
allows both arguments to be vectors, which is what you want (note: that's vectors, not lists). In fact, %in%
operator is a synonym for match()
, as its help tells you.stringi
package's methods stri_match_*
may well directly do what you want, are all vectorized, and are way more performant than either match()
or strsplit()
:
stri_match_all stri_match_all_regex stri_match_first stri_match_first_regex stri_match_last stri_match_last_regex
Also, you probably won't need to use an explicit split function, but if you must, then use stringi::stri_split_*()
, avoid base::strsplit()
Note on performance: avoid splitting strings like the plague in R whenever possible, it creates memory leaks via unnecessary conscells, as gc()
will show you. That's yet another reason why stringi
is very efficient.
Upvotes: 4