Reputation: 6749
I would like to select the rows from ds1
that all of its column values exist in the second data frame ds2
. I found this long way to do it, but for sure exist a built-in function that simplify the process.
ds1 = data.frame(x=c(0,3,2,4,5), y=c(6,7,8,9,10), z=c(11,12,13,14,16))
ds2 = data.frame(x=c(1,2,3,4,5), y=c(6,7,8,9,10), z=c(11,12,13,14,15))
Because the values ds1$x[1]
and ds1$z[5]
don't exist in columns: ds2$x
and ds2$z
respectively, such rows should not be considered, so final result should be:
x y z
2 3 7 12
3 2 8 13
4 4 9 14
therefore the rows: ds1[2:4,]
, I found this long way:
result <- matrix(NA, nrow(ds1), ncol(ds1))
count = 1
for (i in names(ds2)) {
result[,count] <- ds1[, i] %in% ds2[, i]
count <- count + 1
}
rows = rep(NA, nrow(ds1))
for (i in 1:length(rows)) {
rows[i] = all(result[i,])
}
# Finally:
ds1[rows,]
I suspect it should exist a simpler way using some built-in function combination, I google it, but I didn't found any similar case.
Note: I was playing with merge, for example: merge(ds1,ds2)
:
> merge(ds1, ds2)
x y z
1 4 9 14
, but because the column elements can be at different rows like in: ds1$x[2] == ds2$x[3]
, and ds1x[3] == ds2$x[2]
, it does not work and I do not know how to set the additional merge arguments in order to get the expected result.
Upvotes: 1
Views: 96
Reputation: 23214
You can just use the %in%
and &
syntax to express this very simply and concisely:
ds1[ds1$x %in% ds2$x & ds1$z %in% ds2$z,]
x y z 2 3 7 12 3 2 8 13 4 4 9 14
This tells R "Select the rows of ds1
where the following 2 conditions are true:
ds1$x
is found somewhere in ds2$x
ds1$z
"Upvotes: 2