logjammin
logjammin

Reputation: 1211

In R, conditionally left join two tables depending on the value of an indicator variable in the left-hand-side table

Background

I've got two dataframes about baseball cards and their market value. This information comes from Baseball Card "Almanacs", guides to cards' value published every year.

The first, d, is a table with the card_id of each card, as well as an indicator almanac_flag, which tells you if the card_id in that row came from the either the 1999 or 2009 editions of the Baseball Card Almanac:

d <- data.frame(card_id = c("48","2100","F7","2729","F4310","27700"), 
                almanac_flag = c(0,0,1,0,1,0), # 0 = 1999 Almanac, 1 = 2009 almanac 
                stringsAsFactors=T) 

It looks like this:

d

The second dataframe is d2, which contains (not all) equivalent id's for 1999 and 2009, along with a description of which baseball player is depicted in that card. Note that d2 doesn't have all the ID's that appear in d -- it only has 3 "matches" and that's totally fine.

d2 <- data.frame(card_id_1999 = c("48","2100","31"),
                card_id_2009 = c("J18","K02","F7"),
                description = c("Wade Boggs","Frank Thomas","Mickey Mantle"),
                stringsAsFactors=T) 

d2 looks like this:

d2

The Problem

I want to join these two tables so I get a table that looks like this:

d_esired

What I've Tried

So of course, I could use left_join with the key being either card_id = card_id_1999 or card_id = card_id_2009, but that only gets me half of what I need, like so:

d_tried <- left_join(d, d2, by = c("card_id" = "card_id_1999"))

Which gives me this:

d_tried

In a sense I'm asking to do 2 joins in one go, but I'm not sure how to do that.

Any thoughts?

Upvotes: 0

Views: 92

Answers (1)

akrun
akrun

Reputation: 886968

If we do the reshape to 'long' format from 'd2', it should work

library(dplyr)
library(tidyr)
d2 %>%
     pivot_longer(cols = starts_with('card'),
       values_to = 'card_id', names_to = NULL) %>% 
     right_join(d) %>%
     select(names(d), everything())

-output

# A tibble: 6 x 3
  card_id almanac_flag description  
  <fct>          <dbl> <fct>        
1 48                 0 Wade Boggs   
2 2100               0 Frank Thomas 
3 F7                 1 Mickey Mantle
4 2729               0 <NA>         
5 F4310              1 <NA>         
6 27700              0 <NA>        

or another option is to match separately for each column (or join separately) and then do a coalesce such as the first non-NA will be selected

d %>% 
   mutate(description = coalesce(d2$description[match(card_id, 
       d2$card_id_1999)], d2$description[match(card_id, d2$card_id_2009)]))
  card_id almanac_flag   description
1      48            0    Wade Boggs
2    2100            0  Frank Thomas
3      F7            1 Mickey Mantle
4    2729            0          <NA>
5   F4310            1          <NA>
6   27700            0          <NA>

Upvotes: 2

Related Questions