Reputation: 71
I want to know the index of the rows in A that match rows in B.
Both A and B are data frames. For simplicity just assume:
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
In this case it's supposed to return 1,2,3.
My full dataset has 500k rows and 18 columns.
Upvotes: 2
Views: 1164
Reputation: 313
You can use this code :
>subset(a1,a1$a %in% a2$a)
It returns :
>1 a
>2 b
>3 c
If you just want column a, you can add :
>subset(a1,a1$a %in% a2$a,a)
>1
>2
>3
I think it will be fast to do it on your data.
Upvotes: 0
Reputation: 44320
The join.keys
function in the plyr
package provides a key to each unique row across a pair of input data frames, which makes it pretty straightforward to determine which rows from A
appear in B
. In the list returned by join.keys
, x
is the vector of row identifiers for the first data frame and y
is the vector of row identifiers for the second data frame.
library(plyr)
with(join.keys(a1, a2), which(x %in% y))
# [1] 1 2 3
Upvotes: 2