Reputation: 13691
tidyr::expand()
returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups.
For example, let the data be defined as follows:
library( tidyverse )
X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]),
data_frame(Group = "Group2", Value = letters[4:5]) )
We want all combinations of values from Group1
with values from Group2
. My current clunky solution is to separate the values across multiple columns
Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals)
# # A tibble: 1 x 2
# Group1 Group2
# <list> <list>
# 1 <chr [3]> <chr [2]>
followed by a double unnest
operation
Y %>% unnest( .preserve = Group2 ) %>% unnest
# # A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 A e
# 3 B d
# 4 B e
# 5 C d
# 6 C e
This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of unnest
operations that we have to perform.
Is there a more elegant solution?
Upvotes: 5
Views: 2989
Reputation: 67778
Because OP seems happy to use base
, I upgrade my comment to an answer:
expand.grid(split(X$Value, X$Group))
# Group1 Group2
# 1 A d
# 2 B d
# 3 C d
# 4 A e
# 5 B e
# 6 C e
As noted by OP, expand.grid
converts character vectors to factors. To prevent that, use stringsAsFactors = FALSE
.
The tidyverse
equivalent is purrr::cross_df
, which doesn't coerce to factor:
cross_df(split(X$Value, X$Group))
# A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 B d
# 3 C d
# 4 A e
# 5 B e
# 6 C e
Upvotes: 5
Reputation: 3619
This still works with expand
after spread
.
X %>%
mutate(id = row_number()) %>%
spread(Group, Value) %>%
expand(Group1, Group2) %>%
na.omit()
Upvotes: 1
Reputation: 1226
I often use tidyr::crossing()
to join all values from group2
to group
.
data_frame(group = c(LETTERS[1:3])) %>%
crossing(group2 = letters[4:5])
I might do something like this:
data %>%
distinct(group) %>%
crossing(group2)
A more specific example:
dates <- lubridate::make_date(2000:2018)
data_frame(group = letters[1:5]) %>%
crossing(dates)
Upvotes: 1
Reputation: 39154
Here is one option. It will work on the cases with more than two groups although complete_
is deprecated.
library( tidyverse )
X2 <- X %>%
group_by(Group) %>%
mutate(ID = 1:n()) %>%
spread(Group, Value) %>%
select(-ID) %>%
complete_(names(.)) %>%
na.omit()
X2
# # A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 A e
# 3 B d
# 4 B e
# 5 C d
# 6 C e
Update
!!!syms(names(.))
works well with the regular complete
function, thus is better than using complete_
as my original solution.
library( tidyverse )
X2 <- X %>%
group_by(Group) %>%
mutate(ID = 1:n()) %>%
spread(Group, Value) %>%
select(-ID) %>%
complete(!!!syms(names(.))) %>%
na.omit()
X2
# # A tibble: 6 x 2
# Group1 Group2
# <chr> <chr>
# 1 A d
# 2 A e
# 3 B d
# 4 B e
# 5 C d
# 6 C e
Upvotes: 3