user2303557
user2303557

Reputation: 225

remove rows from data frame whose column values don't match another data frame's column values - R

So I have two data frames of different dimensions.

The first one, x, is about 10,000 rows and long looks like:

Year    ID    Number
2008.1  38573 1
2008.2  24395 3

(a lot of data in between)

2008.4  532   4

The second one, x2, is about 80,000 rows long and looks like:

Year    ID     Number
2008.1  38573  2
2008.2  24395  3

(a lot of data in between)

2008.4  532    4

Basically, I want to remove the rows in the second data that satisfy the following condition: that the Year, ID and Number values in the row don't match any rows of the first data frame. So in the above example, I'd remove row 1 from the second data frame, because the Number doesn't match.

I've tried:

x2new <- x2[(x2$ID == x$ID && x2$Year==x$Year && x2$Number == x$Number),]

But it doesn't work because the lengths of the two data frames are different.

I've tried doing a double for loop to remove rows that don't have all 3 conditions, but R simply can't do that many iterations.

Please help! Thanks.

Upvotes: 1

Views: 6978

Answers (2)

Michele
Michele

Reputation: 8753

I understood that you want to remove all the rows where no one of three columns has a match in the first data frame, and keep all the row where at least one column has a match, right? if so, just do this:

newX2 <- x2[ x2$ID %in% x$ID | x2$Year %in% x$Year | x2$Number %in% x$Number,]

Upvotes: 0

agstudy
agstudy

Reputation: 121568

A simple merge

 merge(dat1,dat2)

Using your data for example:

dat1 <- read.table(text='Year,ID,Number
2008.1,38573,1
2008.4,532,4
2008.2,24395,3',header=TRUE,sep=',')

dat2 <- read.table(text='Year,ID,Number
2008.1,38573,2
2008.4,532,4
2008.2,24395,3',header=TRUE,sep=',')

Then you get :

merge(dat1,dat2)
    Year    ID Number
1 2008.2 24395      3
2 2008.4   532      4

Upvotes: 3

Related Questions