Reputation: 305
From a given data frame (myData
, in the example below), I would like to subset the variables with values matching at least one pair of values stored in a list (myList
, in the example below).
myList <- list(c(8,15), c(2,3))
v1 <- c(1, 2, 3, 8, 15)
v2 <- c(3, 7, 8, 9, 10)
v3 <- c(2, 4, 5, 6, 7)
v4 <- c(8, 15, 6, 7, 9)
myData <- cbind(v1, v2, v3, v4)
Ideally the subset should consists only of v1
and v4
because in v1 occurs the pair 8,15 and the pair 2,3, and in v4 occur the pair 8,15.
I tried to use the which
statement for a single pair (i.e., 8, 15), as follows:
subset <- myData[which(myData==unlist(myList[[1]][1]) & myData==unlist(myList[[1]][2]))]
Still, the output is an empty integer. Am I missing something in the which
statement? Plus, how could I implement the code for more than one pair of values?
Upvotes: 2
Views: 1166
Reputation: 1030
I found a solution for this problem:
myData[, unique(which(sapply(myList, function(y) apply(myData, 2, function(x)all(y %in% x))),arr.ind = T)[, 1])]
v1 v4
[1,] 1 8
[2,] 2 15
[3,] 3 6
[4,] 8 7
[5,] 15 9
It is a bit a ugly function therefore the explanations: The apply function checks whether all items from a list item from myList could be found in a column of myData. The sapply function ensures a search to all the items from the list. The which statements checks which he could found and gives the row and the column. We are only interested in the unique rows that are found which outputs the columns. A bit complicated but look at it hopefully it helps:)
Upvotes: 2