Reputation: 129
I have two large data frames. df1 has two columns of interest (amongst others), these are a list of species names in A1 and a list of 3 letter regions in A2. Each row is an independent record so values may be repeated many times in both columns, and there are many rows (~9 million):
A1 A2 species A AFG species B THA species B LOP species C THA
My second data frame (df2) essentially lists every region that a species should occur within, so for each value B1, there will be a number of rows, each with a different value in B2 :
B1 B2 species A AFG species A FLO species B LOP species B PLA species C THA
What I want to do is standardise the values given in A2 (in df1) with those listed in B2 (df2) for each species using R. So for each row in df1, if the value in A1 is the same as B1, AND A2 the same as B2 (if A1==B1 & A2==B2), keep the row. Therefore in my example above, row 2 will be deleted from df1.
I've tried the following but with no success (no rows deleted):
x<-df1[df1$A1 %in% df2$B1 & df1$A2 %in% df2$B2,]
Any suggestions? would the match function be more appropriate?
This is also the first question I've asked on Stack Overflow - my apologies if its not great - any comments on how to improve the question are welcomed!
Cheers!
Upvotes: 2
Views: 6466
Reputation: 22293
You can use merge
to do this. Just have to specify the correct by.x
and by.y
arguments. Here's an example of how to do this:
# your data
df1 <- read.table(text="A1 A2
species_A AFG
species_B THA
species_B LOP
species_C THA", header=TRUE)
df2 <- read.table(text="B1 B2
species_A AFG
species_A FLO
species_B LOP
species_B PLA
species_C THA", header=TRUE)
# merging data.frames
merge(df1, df2[,c("B1", "B2")], by.x=c("A1", "A2"), by.y=c("B1", "B2"))
Upvotes: 4