Reputation: 171
I am trying to accomplish the same goal as is resolved in this question, but I want to filter the table by two grep statements. When I try this:
DT[grep("word1", column) | grep("word2", column)]
I get this error:
Warning message:
In grep("word1", column) | grep("word2", column) :
longer object length is not a multiple of shorter object length
And when I try to combine this logic with an assigment :=
in the j
argument of the data.table, I get all kinds of weirdness. Basically, it's apparent that the OR
operator |
doesn't work with grep
's in the i
argument of a data.table.
I came up with a messy workaround:
DT.a <- DT[grep("word1", column)]
DT.b <- DT[grep("word2", column)]
DT.all <- rbind(DT.a,DT.b)
but I'm hoping there's a better way to accomplish this goal. Any ideas?
Upvotes: 2
Views: 2588
Reputation: 171
The issue here turned out to be a combination of function choice and syntax in the placement of the OR operator |
. DT[grep("word1", column) | grep("word2", column)]
is confusing to data.table because each grep()
returns vectors of indices (integers), which can be of different lengths depending on the data, and the data.table package isn't built to handle this sort of input. grepl()
is a more appropriate function to use here because it returns a boolean of whether there is a regex match or not, and the OR operator |
should be placed within the regex pattern string.
Solution:
DT[grepl("word1|word2", column)]
Upvotes: 3