ifoxfoot
ifoxfoot

Reputation: 223

Left-join two data frames by one column. If no matches are returned, join by another column

I want to join a data frame to another data frame first by one column, then, if there are no "matches" I want it to try to join by another column. The problem is similar to this question, but I'm trying to get a slightly different output.

Here are my 'observations'

#my df
my_plants <- data.frame(scientific_name = c("Abelmoschus esculentus",
                                            "Abies balsamea",
                                            "Ammophila breviligulata",
                                            "Zigadenus glaucus"),
                        percent_cover = c(90, 80, 10, 60))

and here is the main list with some data that I want to extract for each of my observations. Obviously this is simplified.

#hypothetical database
plant_database <- data.frame(scientific_name = c("Abelmoschus esculentus",
                                                 "Abies balsamea",
                                                 "Ammophila breviligulata",
                                                 "Anticlea elegans"),
                             synonym = c(NA_character_,
                                         NA_character_,
                                         NA_character_,
                                         "Zigadenus glaucus"),
                             score = c(1, 1, 2, 6))

Here is a function to join my observations to the main list. Note: I'm using a left_join because I want to know which observations were not matched.

#joining function
joining_fun <- function(plants, database) {

  database_long <- database %>%
    dplyr::mutate(ID = row.names(.)) %>%
    tidyr::pivot_longer(., cols = c(scientific_name, synonym),
                        values_to = "scientific_name")

  join <- dplyr::left_join(plants, database_long, by = "scientific_name") %>% 
    dplyr::select(-name)

  return(join)

}

Which gets me here:

joining_fun(my_plants, plant_database)

scientific_name percent_cover score ID
1  Abelmoschus esculentus            90     1  1
2          Abies balsamea            80     1  2
3 Ammophila breviligulata            10     2  3
4       Zigadenus glaucus            60     6  4

but I want something like this:

scientific_name           synonym           percent_cover score ID  
Abelmoschus esculentus    NA                90     1  1          
Abies balsamea            NA                80     1  2  
Ammophila breviligulata   NA                10     2  3        
Anticlea elegans          Zigadenus glaucus 60     6  4

Thanks!

Upvotes: 2

Views: 958

Answers (1)

zephryl
zephryl

Reputation: 17039

  1. Use inner_join() to create a df of only cases that match on scientific_name.
  2. Use anti_join() to get a version of plants that don't match on scientific_name.
  3. Do another inner_join() of database with these unmatched cases, using key "synonym" = "scientific_name".
  4. Do one more anti_join() to get cases without a match in either column.
  5. Finally, bind all results together.
library(dplyr)

# add test case with no match in either column
my_plants <- add_row(
  my_plants, 
  scientific_name = "Stackus overflovius", 
  percent_cover = 0
)

joining_fun <- function(plants, database) {
  by_sci_name <- inner_join(plants, database, by = "scientific_name")
  no_sci_match <- anti_join(plants, database, by = "scientific_name")
  by_syn <- inner_join(database, no_sci_match, by = c("synonym" = "scientific_name"))
  no_match <- anti_join(no_sci_match, database, by = c("scientific_name" = "synonym"))
  bind_rows(by_syn, by_sci_name, no_match)
}

joining_fun(my_plants, plant_database)
          scientific_name           synonym score percent_cover
1        Anticlea elegans Zigadenus glaucus     6            60
2  Abelmoschus esculentus              <NA>     1            90
3          Abies balsamea              <NA>     1            80
4 Ammophila breviligulata              <NA>     2            10
5     Stackus overflovius              <NA>    NA             0

Upvotes: 2

Related Questions