j20120307
j20120307

Reputation: 71

Find the row numbers in A that match rows in B

I want to know the index of the rows in A that match rows in B.

Both A and B are data frames. For simplicity just assume:

a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])

In this case it's supposed to return 1,2,3.

My full dataset has 500k rows and 18 columns.

Upvotes: 2

Views: 1164

Answers (2)

Estelle Duval
Estelle Duval

Reputation: 313

You can use this code :

>subset(a1,a1$a %in% a2$a)

It returns :

>1  a

>2  b

>3  c

If you just want column a, you can add :

>subset(a1,a1$a %in% a2$a,a)

>1

>2

>3

I think it will be fast to do it on your data.

Upvotes: 0

josliber
josliber

Reputation: 44320

The join.keys function in the plyr package provides a key to each unique row across a pair of input data frames, which makes it pretty straightforward to determine which rows from A appear in B. In the list returned by join.keys, x is the vector of row identifiers for the first data frame and y is the vector of row identifiers for the second data frame.

library(plyr)
with(join.keys(a1, a2), which(x %in% y))
# [1] 1 2 3

Upvotes: 2

Related Questions