Subset variables matching pairs of values in R

Question

From a given data frame (myData, in the example below), I would like to subset the variables with values matching at least one pair of values stored in a list (myList, in the example below).

myList <- list(c(8,15), c(2,3))

v1 <- c(1, 2, 3, 8, 15)
v2 <- c(3, 7, 8, 9, 10)
v3 <- c(2, 4, 5, 6, 7)
v4 <- c(8, 15, 6, 7, 9)

myData <- cbind(v1, v2, v3, v4)

Ideally the subset should consists only of v1 and v4 because in v1 occurs the pair 8,15 and the pair 2,3, and in v4 occur the pair 8,15.

I tried to use the which statement for a single pair (i.e., 8, 15), as follows:

subset <- myData[which(myData==unlist(myList[[1]][1]) & myData==unlist(myList[[1]][2]))]

Still, the output is an empty integer. Am I missing something in the which statement? Plus, how could I implement the code for more than one pair of values?

Tobias Dekker · Accepted Answer

I found a solution for this problem:

myData[, unique(which(sapply(myList, function(y) apply(myData, 2, function(x)all(y %in% x))),arr.ind = T)[, 1])]
     v1 v4
[1,]  1  8
[2,]  2 15
[3,]  3  6
[4,]  8  7
[5,] 15  9

It is a bit a ugly function therefore the explanations: The apply function checks whether all items from a list item from myList could be found in a column of myData. The sapply function ensures a search to all the items from the list. The which statements checks which he could found and gives the row and the column. We are only interested in the unique rows that are found which outputs the columns. A bit complicated but look at it hopefully it helps:)

Subset variables matching pairs of values in R

Answers (1)

Related Questions