Reputation: 113
I have 2 data frames, and I am trying to copy just one column from one data frame into the other data frame by matching 3 columns in the first data frame to 3 columns in the second data frame. I don't want to completely merge the data frames because my real data frames have too many columns each and I don't want them to be all together.
Here are example data frames:
df1
ID_num Terr_Bred Dispersal_Dist Year_Bred
1 1 BAM 760 1987
2 2 GRE 1006 1993
3 3 MEW 999 2000
df2
Mal_ID Date_Rec Year Terr Pair_ID Fem_ID
1 4 3/22/1987 1987 BAM 87 1
2 7 2/22/1987 1987 YER 43 1
3 5 1/17/1993 1993 GRE 22 2
4 8 2/14/1991 1991 GRE 91 2
5 6 10/1/2000 2000 MEW 65 3
I want this:
ID_num Year_Bred Terr_Bred Dispersal_Dist Mate_ID_num
1 1 1987 BAM 760 4
2 2 1993 GRE 1006 5
3 3 2000 MEW 999 6
So, I'm trying to add the Mal_ID column from df2 into df1 and rename the column Mate_ID_num. To do this I want to match the ID_num column from df1 to Fem_ID column in df2, Year_Bred column from df1 to Year column in df2, and the Terr_Bred column from df1 to the Terr column in df2. As seen in the examples above, the names for the columns differ in each data frame.
I haven't been able to find a way to do this, I've only found examples where you completely merge data frames or where you add a column based on matching a single column in each data frame, not multiple.
Upvotes: 0
Views: 6583
Reputation: 269714
1) base This is known as a left join:
by.x <- c("ID_num", "Year_Bred", "Terr_Bred")
by.y <- c("Fem_ID", "Year", "Terr")
df <- merge(df1[by.x], df2[c(by.y, "Mal_ID")],
all.x = TRUE, all.y = FALSE, by.x = by.x, by.y = by.y)
names(df["Mal_ID"]) <- "Mal_ID_num"
giving:
> df
ID_num Year_Bred Terr_Bred Dispersal_Dist Mate_ID_num
1 1 1987 BAM 760 4
2 2 1993 GRE 1006 5
3 3 2000 MEW 999 6
We have used names rather than positional notation since you indicated that the columns do not align in the real problem but for the problem shown in the question (before the question was changed but reflected in the reproducible inputs in the Note) the merge would be slightly shorter with positional notation like this:
df <- merge(df1[1:3], df2[1:4], all.x = TRUE, all.y = FALSE, by = 1:3)
however,
2) It could alternately be done using SQL:
library(sqldf)
sqldf("select a.*, b.Mal_ID Ma1_ID_num
from df1 a left join df2 b on a.ID_num = b.Fem_ID and
a.Year_Bred = b.Year and
a.Terr_Bred = b.Terr")
giving:
ID_num Year_Bred Terr_Bred Dispersal_Dist Ma1_ID_num
1 1 1987 BAM 760 4
2 2 1993 GRE 1006 5
3 3 2000 MEW 999 6
>
Note: df1
and df2
in reproducible form are as follows. The question has these originally but then changed them; however, the answer above uses the original df1
and df2
shown below.
Lines1 <- "
ID_num Year_Bred Terr_Bred Dispersal_Dist
1 1 1987 BAM 760
2 2 1993 GRE 1006
3 3 2000 MEW 999"
df1 <- read.table(text = Lines1, as.is = TRUE)
Lines2 <- "
Fem_ID Year Terr Mal_ID Pair_ID Date_Rec
1 1 1987 BAM 4 87 3/22/1987
2 1 1987 YER 7 43 2/22/1987
3 2 1993 GRE 5 22 1/17/1993
4 2 1991 GRE 8 91 2/14/1991
5 3 2000 MEW 6 65 10/1/2000"
df2 <- read.table(text = Lines2, as.is = TRUE)
Upvotes: 3
Reputation: 2826
You can also do the same using dplyr
, which is a little more expressive than base R:
library(dplyr)
df <- df1 %>%
left_join(df2, c("ID_num" = "Fem_ID", "Year_Bred" = "Year", "Terr_Bred" = "Terr")) %>%
rename(Mate_ID_num = Mal_ID) %>%
select(1:5)
Upvotes: 1