Reputation: 1211
Background
I've got two dataframes about baseball cards and their market value. This information comes from Baseball Card "Almanacs", guides to cards' value published every year.
The first, d
, is a table with the card_id
of each card, as well as an indicator almanac_flag
, which tells you if the card_id
in that row came from the either the 1999 or 2009 editions of the Baseball Card Almanac:
d <- data.frame(card_id = c("48","2100","F7","2729","F4310","27700"),
almanac_flag = c(0,0,1,0,1,0), # 0 = 1999 Almanac, 1 = 2009 almanac
stringsAsFactors=T)
It looks like this:
The second dataframe is d2
, which contains (not all) equivalent id
's for 1999 and 2009, along with a description of which baseball player is depicted in that card. Note that d2
doesn't have all the ID's that appear in d
-- it only has 3 "matches" and that's totally fine.
d2 <- data.frame(card_id_1999 = c("48","2100","31"),
card_id_2009 = c("J18","K02","F7"),
description = c("Wade Boggs","Frank Thomas","Mickey Mantle"),
stringsAsFactors=T)
d2
looks like this:
The Problem
I want to join these two tables so I get a table that looks like this:
What I've Tried
So of course, I could use left_join
with the key being either card_id = card_id_1999
or card_id = card_id_2009
, but that only gets me half of what I need, like so:
d_tried <- left_join(d, d2, by = c("card_id" = "card_id_1999"))
Which gives me this:
In a sense I'm asking to do 2 joins in one go, but I'm not sure how to do that.
Any thoughts?
Upvotes: 0
Views: 92
Reputation: 886968
If we do the reshape to 'long' format from 'd2', it should work
library(dplyr)
library(tidyr)
d2 %>%
pivot_longer(cols = starts_with('card'),
values_to = 'card_id', names_to = NULL) %>%
right_join(d) %>%
select(names(d), everything())
-output
# A tibble: 6 x 3
card_id almanac_flag description
<fct> <dbl> <fct>
1 48 0 Wade Boggs
2 2100 0 Frank Thomas
3 F7 1 Mickey Mantle
4 2729 0 <NA>
5 F4310 1 <NA>
6 27700 0 <NA>
or another option is to match
separately for each column (or join separately) and then do a coalesce
such as the first non-NA will be selected
d %>%
mutate(description = coalesce(d2$description[match(card_id,
d2$card_id_1999)], d2$description[match(card_id, d2$card_id_2009)]))
card_id almanac_flag description
1 48 0 Wade Boggs
2 2100 0 Frank Thomas
3 F7 1 Mickey Mantle
4 2729 0 <NA>
5 F4310 1 <NA>
6 27700 0 <NA>
Upvotes: 2