David Leal
David Leal

Reputation: 6749

Select rows from one data frame where all column values exist in the second data frame

I would like to select the rows from ds1 that all of its column values exist in the second data frame ds2. I found this long way to do it, but for sure exist a built-in function that simplify the process.

ds1 = data.frame(x=c(0,3,2,4,5), y=c(6,7,8,9,10), z=c(11,12,13,14,16))
ds2 = data.frame(x=c(1,2,3,4,5), y=c(6,7,8,9,10), z=c(11,12,13,14,15))

Because the values ds1$x[1] and ds1$z[5] don't exist in columns: ds2$x and ds2$z respectively, such rows should not be considered, so final result should be:

  x y  z
2 3 7 12
3 2 8 13
4 4 9 14

therefore the rows: ds1[2:4,], I found this long way:

result <- matrix(NA, nrow(ds1), ncol(ds1))
count = 1
for (i in names(ds2)) {
    result[,count] <- ds1[, i] %in% ds2[, i]
    count <- count + 1
}

rows = rep(NA, nrow(ds1))
for (i in 1:length(rows)) {
   rows[i] = all(result[i,])
}
# Finally:
ds1[rows,]

I suspect it should exist a simpler way using some built-in function combination, I google it, but I didn't found any similar case.

Note: I was playing with merge, for example: merge(ds1,ds2):

> merge(ds1, ds2)
  x y  z
1 4 9 14

, but because the column elements can be at different rows like in: ds1$x[2] == ds2$x[3], and ds1x[3] == ds2$x[2], it does not work and I do not know how to set the additional merge arguments in order to get the expected result.

Upvotes: 1

Views: 96

Answers (1)

Hack-R
Hack-R

Reputation: 23214

You can just use the %in% and & syntax to express this very simply and concisely:

ds1[ds1$x %in% ds2$x & ds1$z %in% ds2$z,]
  x y  z
2 3 7 12
3 2 8 13
4 4 9 14

This tells R "Select the rows of ds1 where the following 2 conditions are true:

  1. The value of ds1$x is found somewhere in ds2$x
  2. Likewise for ds1$z"

Upvotes: 2

Related Questions