Reputation: 343
Say we have two different datasets:
Dataset A:
ids name price
1234 bread 1.5
245r7 butter 1.2
123984 red wine 5
43498 beer 1
235897 cream 1.8
Dataset B:
ids name price
24908 lait 1
1234,089 pain 1.7
77289,43498 bière 1.5
245r7 beurre 1.4
My goal is to match all the products sharing at least one ID and bring them together into a new dataset that should look as follows:
id a_name b_name a_price b_price
1234 bread pain 1.5 1.7
245r7 butter beurre 1.2 1.4
43498 beer bière 1 1.5
Is this feasible using stringr
or any other R package?
Upvotes: 2
Views: 51
Reputation: 78917
As separate_rows
(my favorite) is already provided by Ronak Shah,
Here is another strategy using strsplit
and unnest()
:
library(tidyr)
library(dplyr)
df_B %>%
mutate(ids = strsplit(as.character(ids), ",")) %>%
unnest() %>%
inner_join(df_A, by="ids")
ids name.x price.x name.y price.y
<chr> <chr> <dbl> <chr> <chr>
1 1234 pain 1.7 bread 1.5
2 43498 bi??re 1.5 beer 1
3 245r7 beurre 1.4 butter 1.2
data:
df_A <- structure(list(ids = c("1234", "245r7", "123984", "43498", "235897"
), name = c("bread", "butter", "red", "beer", "cream"), price = c("1.5",
"1.2", "wine", "1", "1.8")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L), problems = structure(list(
row = 3L, col = NA_character_, expected = "3 columns", actual = "4 columns",
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")))
df_B <- structure(list(ids = c("24908", "1234,089", "77289,43498", "245r7"
), name = c("lait", "pain", "bi??re", "beurre"), price = c(1,
1.7, 1.5, 1.4)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))
Upvotes: 0
Reputation: 388817
You can create a long dataset with separate_rows
and then do a join.
library(dplyr)
library(tidyr)
B %>%
separate_rows(ids, sep = ',') %>%
inner_join(A, by = 'ids')
# ids name.x price.x name.y price.y
# <chr> <chr> <dbl> <chr> <dbl>
#1 1234 pain 1.7 bread 1.5
#2 43498 bière 1.5 beer 1
#3 245r7 beurre 1.4 butter 1.2
Upvotes: 5
Reputation: 520908
We can use the sqldf
package here:
library(sqldf)
sql <- "SELECT a.ids AS id, a.name AS a_name, b.name AS b_name, a.price AS a_price,
b.price AS b_price
FROM df_a a
INNER JOIN df_b b
ON ',' || b.ids || ',' LIKE '%,' || a.ids || ',%'"
output <- sqldf(sql)
Upvotes: 2