Reputation: 1460
Using the dplyr full_join()
operation, I am trying to perform the equivalent of a basic merge()
operation in which no common variables exist (unable to satisfy the "by=" argument). This will blend two data frames and return all possible combinations.
However, the current full_join()
function requires a common variable. I am unable to locate another dplyr function that can help with this. How can I perform this operation using functions specific to the dplyr library?
df_a = data.frame(department=c(1,2,3,4))
df_b = data.frame(period=c(2014,2015,2016,2017))
#This works as desired
big_df = merge(df_a,df_b)
#I'd like to perform the following in a much bigger operation:
big_df = dplyr::full_join(df_a,df_b)
#Error: No common variables. Please specify `by` param.
Upvotes: 15
Views: 10758
Reputation: 3700
If there are duplicate rows, crossing
doesn't give the same result as merge
.
Instead use full_join
with by = character()
to perform a cross-join which generates all combinations of df_a
and df_b
.
library("tidyverse") # version 1.3.2
# Add duplicate rows for illustration.
df_a <- tibble(department = c(1, 2, 3, 3))
df_b <- tibble(period = c(2014, 2015, 2016, 2017))
merge
doesn't de-duplicate.
df_a_merge_b <- merge(df_a, df_b)
df_a_merge_b
#> department period
#> 1 1 2014
#> 2 2 2014
#> 3 3 2014
#> 4 3 2014
#> 5 1 2015
#> 6 2 2015
#> 7 3 2015
#> 8 3 2015
#> 9 1 2016
#> 10 2 2016
#> 11 3 2016
#> 12 3 2016
#> 13 1 2017
#> 14 2 2017
#> 15 3 2017
#> 16 3 2017
crossing
drops duplicate rows.
df_a_crossing_b <- crossing(df_a, df_b)
df_a_crossing_b
#> # A tibble: 12 × 2
#> department period
#> <dbl> <dbl>
#> 1 1 2014
#> 2 1 2015
#> 3 1 2016
#> 4 1 2017
#> 5 2 2014
#> 6 2 2015
#> 7 2 2016
#> 8 2 2017
#> 9 3 2014
#> 10 3 2015
#> 11 3 2016
#> 12 3 2017
full_join
doesn't remove duplicates either.
df_a_full_join_b <- full_join(df_a, df_b, by = character())
df_a_full_join_b
#> # A tibble: 16 × 2
#> department period
#> <dbl> <dbl>
#> 1 1 2014
#> 2 1 2015
#> 3 1 2016
#> 4 1 2017
#> 5 2 2014
#> 6 2 2015
#> 7 2 2016
#> 8 2 2017
#> 9 3 2014
#> 10 3 2015
#> 11 3 2016
#> 12 3 2017
#> 13 3 2014
#> 14 3 2015
#> 15 3 2016
#> 16 3 2017
packageVersion("tidyverse")
#> [1] '1.3.2'
Created on 2023-01-13 with reprex v2.0.2
Upvotes: 0
Reputation: 16287
You can use crossing
from tidyr
:
crossing(df_a,df_b)
department period
1 1 2014
2 1 2015
3 1 2016
4 1 2017
5 2 2014
6 2 2015
7 2 2016
8 2 2017
9 3 2014
10 3 2015
11 3 2016
12 3 2017
13 4 2014
14 4 2015
15 4 2016
16 4 2017
Upvotes: 40