kim1801
kim1801

Reputation: 129

Subsetting a data frame when values in two columns match those in a second data frame

I have two large data frames. df1 has two columns of interest (amongst others), these are a list of species names in A1 and a list of 3 letter regions in A2. Each row is an independent record so values may be repeated many times in both columns, and there are many rows (~9 million):

   A1            A2
 species A       AFG  
 species B       THA  
 species B       LOP    
 species C       THA  

My second data frame (df2) essentially lists every region that a species should occur within, so for each value B1, there will be a number of rows, each with a different value in B2 :

  B1         B2  
 species A   AFG
 species A   FLO
 species B   LOP   
 species B   PLA   
 species C   THA   

What I want to do is standardise the values given in A2 (in df1) with those listed in B2 (df2) for each species using R. So for each row in df1, if the value in A1 is the same as B1, AND A2 the same as B2 (if A1==B1 & A2==B2), keep the row. Therefore in my example above, row 2 will be deleted from df1.

I've tried the following but with no success (no rows deleted):

x<-df1[df1$A1 %in% df2$B1 & df1$A2 %in% df2$B2,]

Any suggestions? would the match function be more appropriate?

This is also the first question I've asked on Stack Overflow - my apologies if its not great - any comments on how to improve the question are welcomed!

Cheers!

Upvotes: 2

Views: 6466

Answers (1)

shadow
shadow

Reputation: 22293

You can use merge to do this. Just have to specify the correct by.x and by.y arguments. Here's an example of how to do this:

# your data
df1 <- read.table(text="A1            A2
species_A       AFG  
species_B       THA  
species_B       LOP    
species_C       THA", header=TRUE)

df2 <- read.table(text="B1         B2  
species_A   AFG
species_A   FLO
species_B   LOP   
species_B   PLA   
species_C   THA", header=TRUE)

# merging data.frames
merge(df1, df2[,c("B1", "B2")], by.x=c("A1", "A2"), by.y=c("B1", "B2"))

Upvotes: 4

Related Questions