VincentLin
VincentLin

Reputation: 361

How to use left join in tidyverse package only match the first one with multiple different variables in two dfs?

Say I have to 2 dfs

df_1 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")), x1 = c(10L, 11L, 21L, 21L, 13L), text1 = c("text a", "text b", "text c", "text d", "text e"))
df_2 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),x2 = c(10L, 21L, 22L, 23L),text2 = c("text aa", "text bb", "text cc", "text dd"))

I know I could use join function in plyr package to match the first with only one variable

plyr::join(df_2, df_1, type = 'left', match = 'first', by = 'dates')

But with 2 variables, "dates" and "x", it will throw error:

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c('dates' = 'dates', 'x2' = 'x1'))

Also I could use left_join in dplyr with multiple variables

df_2 %>% 
  left_join(df_1, by = c('dates' = 'dates', 'x2' = 'x1'))

But no the first match argument. Any helps, thanks

Upvotes: 0

Views: 2425

Answers (2)

lroha
lroha

Reputation: 34291

You can't, not directly, multiple matching combinations are always returned using join operations with dplyr. If you want only the first match you could group by the joining variables and use slice() on the second table before the join.

df_2 %>% 
  left_join(df_1 %>%
              group_by(dates, x1) %>%
              slice(1), by = c('dates' = 'dates', 'x2' = 'x1'))

# A tibble: 4 x 4
  dates         x2 text2   text1 
  <date>     <int> <chr>   <chr> 
1 2018-07-01    10 text aa text a
2 2018-06-01    21 text bb text c
3 2018-05-01    22 text cc NA    
4 2018-04-01    23 text dd NA    

Upvotes: 1

Taufi
Taufi

Reputation: 1577

One (rather cheating) solution would be to rename the column that throws an error like so,

colnames(df_1)[2] = "x2" 

and then do

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c("dates", "x2"))

which yields

       dates x2   text2  text1
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc   <NA>
4 2018-04-01 23 text dd   <NA>

Upvotes: 0

Related Questions