ARandomUser
ARandomUser

Reputation: 130

Subset a data.table by a vector of substrings

Assuming we got this datatable X :

Random <- function(n=1, lenght=6){
  randomString <- c(1:n)
  for (i in 1:n){randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                   lenght, replace=TRUE),collapse="")}
  return(randomString)}

X <- data.table(A = rnorm(11000, sd = 0.8),
                B = rnorm(11000, mean = 10, sd = 3),
                C = sample( LETTERS[1:24], 11000, replace=TRUE),
                D = sample( letters[1:24], 11000, replace=TRUE),
                E = round(rnorm(11000,mean=25, sd=3)),
                F = round(runif(n = 11000,min = 1000,max = 25000)),
                G = round(runif(11000,0,200000)),
                H = Random(11000))

I want to subset it by some substring. Here, we will take g, F and d in column H

Here, we got a solution to do this for one substring : How to select R data.table rows based on substring match (a la SQL like)

If we only want g, using data.table package :

X[like(H,pattern = "g")]

But my problem is to replicate this for g, F and d in a single operation.

Vec <- c("g","F","d")
Newtable <- X[like(H,pattern = Vec)]
Warning message:
In grep(pattern, levels(vector)) :
  argument 'pattern' has length > 1 and only the first element will be used

Is there a way to do this whitout creating 3 tables, merging them and remove duplicates ?

Upvotes: 1

Views: 572

Answers (2)

advek88
advek88

Reputation: 36

I think you can also use this:

NewTable <- X[grepl("g",H) | grepl("F",H)  | grepl("d",H)]

Upvotes: 1

akrun
akrun

Reputation: 887901

We can use grep by pasteing the vector into a single string by collapseing with |.

X[grep(paste(Vec, collapse="|"), H)]

Or we can use the same approach by pasteing the pattern vector collapsed by | (as suggested by @Tensibal)

X[like(H, pattern = paste(Vec, collapse="|"))]

Upvotes: 4

Related Questions