Tyler
Tyler

Reputation: 3

R match two character columns in one tibble with two other character columns in another tibble

Say I have two objects,

mixed
# A tibble: 7 x 2
  genus        epithet   
  <chr>        <chr>     
1 Vincetoxicum nigrum    
2 Rosa         multiflora
3 Quercus      rubra     
4 Acer         saccharum 
5 Rosa         pendula   
6 Vincetoxicum nigrum    
7 Vincetoxicum nigrum

and

invasives
# A tibble: 4 x 2
  genus        epithet   
  <chr>        <chr>     
1 Larix        pendula   
2 Picea        abies     
3 Rosa         multiflora
4 Vincetoxicum nigrum

I want to check whether both columns of "mixed" match with both the columns of "invasives", and get an index that would allow me to pull those matching from "mixed". Note that "pendula" is in "epithet" in both "mixed" and "invasives", but its corresponding row in the first column has "Larix" in "invasives" and "Rosa" in "mixed", so it is not included in the final product.

So once that index was created, I'm thinking I'd want to run:

columns_matched <- mixed[index,]

yielding:

columns_matched
# A tibble: 4 x 2
  genus        epithet   
  <chr>        <chr>     
1 Vincetoxicum nigrum    
2 Rosa         multiflora 
3 Vincetoxicum nigrum    
4 Vincetoxicum nigrum 

csv versions of the tables:

genus,epithet
Vincetoxicum,nigrum
Rosa,multiflora
Quercus,rubra
Acer,saccharum
Rosa,pendula
Vincetoxicum,nigrum
Vincetoxicum,nigrum

genus,epithet
Larix,pendula
Picea,abies
Rosa,multiflora
Vincetoxicum,nigrum

Thanks.

Upvotes: 0

Views: 46

Answers (1)

Max Teflon
Max Teflon

Reputation: 1800

The easiest answer that comes to mind is to just inner_join your data-sets. This way, only identical rows are left over:

library(tidyverse)
mixed <- read_csv('genus,epithet
Vincetoxicum,nigrum
Rosa,multiflora
Quercus,rubra
Acer,saccharum
Rosa,pendula
Vincetoxicum,nigrum
Vincetoxicum,nigrum')
#> Rows: 7 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): genus, epithet
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

invasives <- read_csv('genus,epithet
Larix,pendula
Picea,abies
Rosa,multiflora
Vincetoxicum,nigrum')
#> Rows: 4 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): genus, epithet
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.


mixed %>% 
  inner_join(invasives)
#> Joining, by = c("genus", "epithet")
#> # A tibble: 4 × 2
#>   genus        epithet   
#>   <chr>        <chr>     
#> 1 Vincetoxicum nigrum    
#> 2 Rosa         multiflora
#> 3 Vincetoxicum nigrum    
#> 4 Vincetoxicum nigrum

If you really wanted to have that index, you could just add a dummy-column to your mixed-tibble:

index <- mixed %>% 
  mutate(index = seq_along(genus)) %>% 
  inner_join(invasives) %>% 
  pull(index)
#> Joining, by = c("genus", "epithet")

mixed[index,]
#> # A tibble: 4 × 2
#>   genus        epithet   
#>   <chr>        <chr>     
#> 1 Vincetoxicum nigrum    
#> 2 Rosa         multiflora
#> 3 Vincetoxicum nigrum    
#> 4 Vincetoxicum nigrum

Upvotes: 1

Related Questions