Reputation: 223
I want to join a data frame to another data frame first by one column, then, if there are no "matches" I want it to try to join by another column. The problem is similar to this question, but I'm trying to get a slightly different output.
Here are my 'observations'
#my df
my_plants <- data.frame(scientific_name = c("Abelmoschus esculentus",
"Abies balsamea",
"Ammophila breviligulata",
"Zigadenus glaucus"),
percent_cover = c(90, 80, 10, 60))
and here is the main list
with some data that I want to extract for each of my observations. Obviously this is simplified.
#hypothetical database
plant_database <- data.frame(scientific_name = c("Abelmoschus esculentus",
"Abies balsamea",
"Ammophila breviligulata",
"Anticlea elegans"),
synonym = c(NA_character_,
NA_character_,
NA_character_,
"Zigadenus glaucus"),
score = c(1, 1, 2, 6))
Here is a function to join my observations to the main list. Note: I'm using a left_join because I want to know which observations were not matched.
#joining function
joining_fun <- function(plants, database) {
database_long <- database %>%
dplyr::mutate(ID = row.names(.)) %>%
tidyr::pivot_longer(., cols = c(scientific_name, synonym),
values_to = "scientific_name")
join <- dplyr::left_join(plants, database_long, by = "scientific_name") %>%
dplyr::select(-name)
return(join)
}
Which gets me here:
joining_fun(my_plants, plant_database)
scientific_name percent_cover score ID
1 Abelmoschus esculentus 90 1 1
2 Abies balsamea 80 1 2
3 Ammophila breviligulata 10 2 3
4 Zigadenus glaucus 60 6 4
but I want something like this:
scientific_name synonym percent_cover score ID
Abelmoschus esculentus NA 90 1 1
Abies balsamea NA 80 1 2
Ammophila breviligulata NA 10 2 3
Anticlea elegans Zigadenus glaucus 60 6 4
Thanks!
Upvotes: 2
Views: 958
Reputation: 17039
inner_join()
to create a df of only cases that match on scientific_name
.anti_join()
to get a version of plants
that don't match on scientific_name
.inner_join()
of database
with these unmatched cases, using key "synonym" = "scientific_name"
.anti_join()
to get cases without a match in either column.library(dplyr)
# add test case with no match in either column
my_plants <- add_row(
my_plants,
scientific_name = "Stackus overflovius",
percent_cover = 0
)
joining_fun <- function(plants, database) {
by_sci_name <- inner_join(plants, database, by = "scientific_name")
no_sci_match <- anti_join(plants, database, by = "scientific_name")
by_syn <- inner_join(database, no_sci_match, by = c("synonym" = "scientific_name"))
no_match <- anti_join(no_sci_match, database, by = c("scientific_name" = "synonym"))
bind_rows(by_syn, by_sci_name, no_match)
}
joining_fun(my_plants, plant_database)
scientific_name synonym score percent_cover
1 Anticlea elegans Zigadenus glaucus 6 60
2 Abelmoschus esculentus <NA> 1 90
3 Abies balsamea <NA> 1 80
4 Ammophila breviligulata <NA> 2 10
5 Stackus overflovius <NA> NA 0
Upvotes: 2