9314197
9314197

Reputation: 241

Filter data.table based on string match from another vector

I'm trying to select rows in a data.table. I need the values in variable dt$s to start with any of the strings in vector y

dt <- data.table(x = (c(1:5)), s = c("a", "ab", "b.c", "db", "d"))
y <- c("a", "b")

Desired result:

   x   s
1: 1   a
2: 2  ab
3: 3 b.c

I would use dt[s %in% y] for a full match, and %like% or "^a*" for a partial match with a single string, but I'm not sure how to get a strict starts with match on a character vector.

My real dataset and character vector is quite large, so I'd appreciate an efficient solution.

Thanks.

Upvotes: 2

Views: 1480

Answers (2)

akrun
akrun

Reputation: 887118

Using glue and filter

library(glue)
library(dplyr)
library(stringr)
dt %>% 
  filter(str_detect(s, glue("^({str_c(y, collapse = '|')})")))
#   x   s
#1: 1   a
#2: 2  ab
#3: 3 b.c

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You can create a pattern dynamically from y.

library(data.table)
pat <- sprintf('^(%s)', paste0(y, collapse = '|'))
pat
#[1] "^(a|b)"

and use it to subset the data.

dt[grepl(pat, s)]

#   x   s
#1: 1   a
#2: 2  ab
#3: 3 b.c

Upvotes: 1

Related Questions