Reputation: 225
So I have two data frames of different dimensions.
The first one, x, is about 10,000 rows and long looks like:
Year ID Number
2008.1 38573 1
2008.2 24395 3
(a lot of data in between)
2008.4 532 4
The second one, x2, is about 80,000 rows long and looks like:
Year ID Number
2008.1 38573 2
2008.2 24395 3
(a lot of data in between)
2008.4 532 4
Basically, I want to remove the rows in the second data that satisfy the following condition: that the Year, ID and Number values in the row don't match any rows of the first data frame. So in the above example, I'd remove row 1 from the second data frame, because the Number doesn't match.
I've tried:
x2new <- x2[(x2$ID == x$ID && x2$Year==x$Year && x2$Number == x$Number),]
But it doesn't work because the lengths of the two data frames are different.
I've tried doing a double for loop to remove rows that don't have all 3 conditions, but R simply can't do that many iterations.
Please help! Thanks.
Upvotes: 1
Views: 6978
Reputation: 8753
I understood that you want to remove all the rows where no one of three columns has a match in the first data frame, and keep all the row where at least one column has a match, right? if so, just do this:
newX2 <- x2[ x2$ID %in% x$ID | x2$Year %in% x$Year | x2$Number %in% x$Number,]
Upvotes: 0
Reputation: 121568
A simple merge
merge(dat1,dat2)
Using your data for example:
dat1 <- read.table(text='Year,ID,Number
2008.1,38573,1
2008.4,532,4
2008.2,24395,3',header=TRUE,sep=',')
dat2 <- read.table(text='Year,ID,Number
2008.1,38573,2
2008.4,532,4
2008.2,24395,3',header=TRUE,sep=',')
Then you get :
merge(dat1,dat2)
Year ID Number
1 2008.2 24395 3
2 2008.4 532 4
Upvotes: 3