teogj
teogj

Reputation: 343

Match same strings over two different vectors

Say we have two different datasets:

Dataset A:

ids        name          price
1234       bread         1.5
245r7      butter        1.2
123984     red wine      5
43498      beer          1
235897     cream         1.8

Dataset B:

ids          name       price
24908        lait       1
1234,089     pain       1.7
77289,43498  bière      1.5
245r7        beurre     1.4

My goal is to match all the products sharing at least one ID and bring them together into a new dataset that should look as follows:

id       a_name      b_name     a_price      b_price
1234     bread       pain       1.5          1.7
245r7    butter      beurre     1.2          1.4
43498    beer        bière      1            1.5

Is this feasible using stringr or any other R package?

Upvotes: 2

Views: 51

Answers (3)

TarJae
TarJae

Reputation: 78917

As separate_rows (my favorite) is already provided by Ronak Shah, Here is another strategy using strsplit and unnest():

library(tidyr)
library(dplyr)
df_B %>% 
  mutate(ids = strsplit(as.character(ids), ",")) %>% 
  unnest() %>% 
  inner_join(df_A, by="ids")
  ids   name.x price.x name.y price.y
  <chr> <chr>    <dbl> <chr>  <chr>  
1 1234  pain       1.7 bread  1.5    
2 43498 bi??re     1.5 beer   1      
3 245r7 beurre     1.4 butter 1.2   

data:

df_A <- structure(list(ids = c("1234", "245r7", "123984", "43498", "235897"
), name = c("bread", "butter", "red", "beer", "cream"), price = c("1.5", 
"1.2", "wine", "1", "1.8")), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L), problems = structure(list(
row = 3L, col = NA_character_, expected = "3 columns", actual = "4 columns", 
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")))

df_B <- structure(list(ids = c("24908", "1234,089", "77289,43498", "245r7"
), name = c("lait", "pain", "bi??re", "beurre"), price = c(1, 
1.7, 1.5, 1.4)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

You can create a long dataset with separate_rows and then do a join.

library(dplyr)
library(tidyr)

B %>%
  separate_rows(ids, sep = ',') %>%
  inner_join(A, by = 'ids')

#   ids   name.x price.x name.y price.y
#  <chr> <chr>    <dbl> <chr>    <dbl>
#1 1234  pain       1.7 bread      1.5
#2 43498 bière      1.5 beer       1  
#3 245r7 beurre     1.4 butter     1.2

Upvotes: 5

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

We can use the sqldf package here:

library(sqldf)

sql <- "SELECT a.ids AS id, a.name AS a_name, b.name AS b_name, a.price AS a_price,
               b.price AS b_price
        FROM df_a a
        INNER JOIN df_b b
            ON ',' || b.ids || ',' LIKE '%,' || a.ids || ',%'"
output <- sqldf(sql)

Upvotes: 2

Related Questions