Reputation: 361
Say I have to 2 dfs
df_1 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")), x1 = c(10L, 11L, 21L, 21L, 13L), text1 = c("text a", "text b", "text c", "text d", "text e"))
df_2 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),x2 = c(10L, 21L, 22L, 23L),text2 = c("text aa", "text bb", "text cc", "text dd"))
I know I could use join function in plyr package to match the first with only one variable
plyr::join(df_2, df_1, type = 'left', match = 'first', by = 'dates')
But with 2 variables, "dates" and "x", it will throw error:
plyr::join(df_2, df_1, type = 'left', match = 'first', by = c('dates' = 'dates', 'x2' = 'x1'))
Also I could use left_join in dplyr with multiple variables
df_2 %>%
left_join(df_1, by = c('dates' = 'dates', 'x2' = 'x1'))
But no the first match argument. Any helps, thanks
Upvotes: 0
Views: 2425
Reputation: 34291
You can't, not directly, multiple matching combinations are always returned using join operations with dplyr
. If you want only the first match you could group by the joining variables and use slice()
on the second table before the join.
df_2 %>%
left_join(df_1 %>%
group_by(dates, x1) %>%
slice(1), by = c('dates' = 'dates', 'x2' = 'x1'))
# A tibble: 4 x 4
dates x2 text2 text1
<date> <int> <chr> <chr>
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc NA
4 2018-04-01 23 text dd NA
Upvotes: 1
Reputation: 1577
One (rather cheating) solution would be to rename the column that throws an error like so,
colnames(df_1)[2] = "x2"
and then do
plyr::join(df_2, df_1, type = 'left', match = 'first', by = c("dates", "x2"))
which yields
dates x2 text2 text1
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc <NA>
4 2018-04-01 23 text dd <NA>
Upvotes: 0