Rstudent
Rstudent

Reputation: 885

Merging 2 data frames with different sizes

I have 2 data frames that I would like to merge. In df1observation were recorded on 2 different days. Each record has an index , id1 person identification number and id2 refers the number of the day that recording was made (day had to be different ).There is also a Day variable that records the week day when the recording was made.

In df2 observations were recorded just based on serial number and id1 person identification number. There is just one observation per person. Similarly here there is also a Day variable that records when the recordings were made.

I would like to identify the observations from df2 that were recorded on the same day as in df1.

How can I do this?

Sample data

df1:

    structure(list(index = c(11011202, 11011202, 11011202, 11011202, 
11011203, 11011203, 11011207, 11011207, 11011207, 11011207, 11011209, 
11011209, 11011209, 11011209, 11011210, 11011210, 11011210, 11011210, 
11011211, 11011211, 11011211, 11011211, 11011212, 11011212, 11011212, 
11011212, 11011212, 11011212, 11011212, 11011212, 11011213, 11011213, 
11011213, 11011213, 11011213, 11011213, 11011217, 11011217, 11011219, 
11011219, 11011220, 11011220, 11011220, 11011220, 11011220, 11011220, 
11020202, 11020202, 11020202, 11020202), id1 = c(1, 1, 4, 4, 
1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 
2, 3, 3, 4, 4, 1, 1, 3, 3, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 
1, 1, 2, 2), id2 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2), Day = c(5, 1, 5, 
1, 1, 3, 4, 7, 4, 7, 4, 1, 4, 1, 5, 7, 5, 7, 1, 2, 1, 2, 7, 2, 
7, 2, 7, 2, 7, 2, 7, 4, 7, 4, 7, 4, 4, 1, 3, 1, 1, 2, 1, 2, 1, 
2, 4, 7, 4, 7)), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))

df2:

    structure(list(Day = c(3, 3, 4, 6, 6, 6, 7, 7, 7, 7, 4, 4, 6, 
6, 6, 4, 3, 7, 7, 5, 5, 7, 5, 6, 6), index = c(11011209, 11011209, 
11011210, 11011212, 11011212, 11011213, 11011213, 11011220, 11011220, 
11020208, 11020212, 11020212, 11020301, 11020301, 11020301, 11020305, 
11020310, 11020315, 11020315, 11020316, 11020316, 11020320, 11020606, 
11020611, 11020611), id1 = c(1, 2, 2, 1, 2, 1, 4, 1, 2, 2, 1, 
2, 1, 2, 3, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2)), row.names = c(NA, 
-25L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 88

Answers (2)

Ian Campbell
Ian Campbell

Reputation: 24770

Edit: Based on your clarifications, I think this is what you wanted. I switched to using tidyverse since your data is stored in tibbles.

library(dplyr)
inner_join(df1,df2,by=c("id1","Day"),suffix=c(".df1",".df2"))
#   index.df1   id1   id2   Day index.df2
#       <dbl> <dbl> <dbl> <dbl>     <dbl>
# 1  11011202     1     1     5  11020316
# 2  11011202     1     1     5  11020606
# 3  11011203     1     2     3  11011209
# 4  11011203     1     2     3  11020310
# 5  11011207     1     1     4  11020212
# 6  11011207     1     1     4  11020305
# 7  11011207     1     2     7  11011220
# 8  11011207     1     2     7  11020315
# 9  11011207     2     1     4  11011210
#10  11011207     2     1     4  11020212
## … with 43 more rows

Upvotes: 1

akrun
akrun

Reputation: 886938

Both the datasets have duplicate by variables, resulting in duplication of rows after the join. One option is to nest by those variables, and then do a join

library(dplyr)
library(tidyr)
df2 %>%
      group_by(Day, index) %>%
      nest %>%
      left_join(df1 %>% 
                   rename(idnew = id1)) %>% 
      unnest(data)

Upvotes: 2

Related Questions