Jennifer Diamond
Jennifer Diamond

Reputation: 113

How to copy a column from a dataframe into another by matching 3 columns in each

I have 2 data frames, and I am trying to copy just one column from one data frame into the other data frame by matching 3 columns in the first data frame to 3 columns in the second data frame. I don't want to completely merge the data frames because my real data frames have too many columns each and I don't want them to be all together.

Here are example data frames:

df1

       ID_num  Terr_Bred  Dispersal_Dist  Year_Bred
  1         1        BAM             760       1987
  2         2        GRE            1006       1993
  3         3        MEW             999       2000

df2

        Mal_ID    Date_Rec   Year  Terr  Pair_ID  Fem_ID
  1          4   3/22/1987   1987   BAM       87       1
  2          7   2/22/1987   1987   YER       43       1
  3          5   1/17/1993   1993   GRE       22       2
  4          8   2/14/1991   1991   GRE       91       2
  5          6   10/1/2000   2000   MEW       65       3

I want this:

       ID_num  Year_Bred   Terr_Bred   Dispersal_Dist Mate_ID_num
  1         1       1987         BAM              760           4
  2         2       1993         GRE             1006           5
  3         3       2000         MEW              999           6

So, I'm trying to add the Mal_ID column from df2 into df1 and rename the column Mate_ID_num. To do this I want to match the ID_num column from df1 to Fem_ID column in df2, Year_Bred column from df1 to Year column in df2, and the Terr_Bred column from df1 to the Terr column in df2. As seen in the examples above, the names for the columns differ in each data frame.

I haven't been able to find a way to do this, I've only found examples where you completely merge data frames or where you add a column based on matching a single column in each data frame, not multiple.

Upvotes: 0

Views: 6583

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269714

1) base This is known as a left join:

by.x <- c("ID_num", "Year_Bred", "Terr_Bred")
by.y <- c("Fem_ID", "Year", "Terr")

df <- merge(df1[by.x], df2[c(by.y, "Mal_ID")], 
         all.x = TRUE, all.y = FALSE, by.x = by.x, by.y = by.y)

names(df["Mal_ID"]) <- "Mal_ID_num"

giving:

> df
  ID_num Year_Bred Terr_Bred Dispersal_Dist Mate_ID_num
1      1      1987       BAM            760           4
2      2      1993       GRE           1006           5
3      3      2000       MEW            999           6

We have used names rather than positional notation since you indicated that the columns do not align in the real problem but for the problem shown in the question (before the question was changed but reflected in the reproducible inputs in the Note) the merge would be slightly shorter with positional notation like this:

df <- merge(df1[1:3], df2[1:4], all.x = TRUE, all.y = FALSE, by = 1:3)

however,

2) It could alternately be done using SQL:

library(sqldf)
sqldf("select a.*, b.Mal_ID Ma1_ID_num
       from df1 a left join df2 b on a.ID_num = b.Fem_ID and 
                                     a.Year_Bred = b.Year and 
                                     a.Terr_Bred = b.Terr")

giving:

  ID_num Year_Bred Terr_Bred Dispersal_Dist Ma1_ID_num
1      1      1987       BAM            760          4
2      2      1993       GRE           1006          5
3      3      2000       MEW            999          6
> 

Note: df1 and df2 in reproducible form are as follows. The question has these originally but then changed them; however, the answer above uses the original df1 and df2 shown below.

Lines1 <- "
       ID_num  Year_Bred  Terr_Bred  Dispersal_Dist  
  1         1       1987        BAM             760
  2         2       1993        GRE            1006
  3         3       2000        MEW             999"
df1 <- read.table(text = Lines1, as.is = TRUE)

Lines2 <- "
        Fem_ID   Year  Terr      Mal_ID  Pair_ID    Date_Rec
  1          1   1987   BAM           4       87   3/22/1987
  2          1   1987   YER           7       43   2/22/1987
  3          2   1993   GRE           5       22   1/17/1993
  4          2   1991   GRE           8       91   2/14/1991
  5          3   2000   MEW           6       65   10/1/2000"
df2 <- read.table(text = Lines2, as.is = TRUE)

Upvotes: 3

Oriol Mirosa
Oriol Mirosa

Reputation: 2826

You can also do the same using dplyr, which is a little more expressive than base R:

library(dplyr)

df <- df1 %>%
  left_join(df2, c("ID_num" = "Fem_ID", "Year_Bred" = "Year", "Terr_Bred" = "Terr")) %>%
  rename(Mate_ID_num = Mal_ID) %>%
  select(1:5)

Upvotes: 1

Related Questions