Subsetting a Data Table using %in%

Question

A stylized version of my data.table is

outmat <- data.table(merge(merge(1:5, 1:5, all=TRUE), 1:5, all=TRUE))

What I would like to do is select a subset of rows from this data.table based on whether the value in the 1st column is found in any of the other columns (it will be handling matrices of unknown dimension, so I can't just use some sort of "row1 == row2 | row1 == row3"

I wanted to do this using

output[row1 %in% names(output)[-1], ]

but this ends up returning TRUE if the value in row1 is found in any of the rows of row2 or row3, which is not the intended behavior. It there some sort of vectorized version of %in% that will achieve my desired result?

To elaborate, what I want to get is the enumeration of 3-tuples from the set 1:5, drawn with replacement, such that the first value is the same as either the second or third value, something like:

1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
...
2 1 2
2 2 1
...
5 5 5

What my code instead gives me is every enumeration of 3-tuples, as it is checking whether the first digit (say, 5), ever appears anywhere in the 2rd or 3rd columns, not simply within the same row.

eddi · Accepted Answer

One option is to construct the expression and evaluate it:

dt = data.table(a = 1:5, b = c(1,2,4,3,1), c = c(4,2,3,2,2), d = 5:1)
#   a b c d
#1: 1 1 4 5
#2: 2 2 2 4
#3: 3 4 3 3
#4: 4 3 2 2
#5: 5 1 2 1

expr = paste(paste(names(dt)[-1], collapse = paste0(" == ", names(dt)[1], " | ")),
             "==", names(dt)[1])
#[1] "b == a | c == a | d == a"

dt[eval(parse(text = expr))]
#   a b c d
#1: 1 1 4 5
#2: 2 2 2 4
#3: 3 4 3 3

Another option is to just loop through and compare the columns:

dt[rowSums(sapply(dt, '==', dt[[1]])) > 1]
#   a b c d
#1: 1 1 4 5
#2: 2 2 2 4
#3: 3 4 3 3

Subsetting a Data Table using %in%

Answers (2)

Related Questions