JHall651
JHall651

Reputation: 437

How to subset data.table using vector of strings in r

I have a large data.table with lines of text in each row. I am trying to subset the data.table by finding lines that contain one of several words. Here is what I have tried.

textDt <- data.table(LinesOfText = c("There was a small frog.","Most of the 
time I ate chicken","There are so many places to stay here.","People on 
stackoverflow are tremendously helpful.","Why do grapefuits cause weird drug 
interactions?","If I were tiny I could fit in there"))

targetWords <- c("small","tiny","no room","cramped","mini")

targetDt <- textDt[targetWords %in% LinesOfText]

This always results in an error. I know there must be an easy solution that eludes me.

Upvotes: 1

Views: 403

Answers (1)

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

I like using stringr because I believe it's faster. So here's a solution based on that:

library(stringr)
targetWords<- paste(targetWords, collapse = "|")
# "small|tiny|no room|cramped|mini"

targetDT<- textDt[str_detect(LinesOfText , targetWords)]
targetDT
#                           LinesOfText 
#1: If I were tiny I could fit in there
#2:             There was a small frog.

Upvotes: 1

Related Questions