Artem Sokolov
Artem Sokolov

Reputation: 13691

tidyr::expand() for a single column across groups

tidyr::expand() returns all possible combinations of values from multiple columns. I'm looking for a slightly different behavior, where all the values are in a single column and the combinations are to be taken across groups.

For example, let the data be defined as follows:

library( tidyverse )
X <- bind_rows( data_frame(Group = "Group1", Value = LETTERS[1:3]),
                data_frame(Group = "Group2", Value = letters[4:5]) )

We want all combinations of values from Group1 with values from Group2. My current clunky solution is to separate the values across multiple columns

Y <- X %>% group_by(Group) %>% do(vals = .$Value) %>% spread(Group, vals)
# # A tibble: 1 x 2
#   Group1    Group2   
#   <list>    <list>   
# 1 <chr [3]> <chr [2]>

followed by a double unnest operation

Y %>% unnest( .preserve = Group2 ) %>% unnest
# # A tibble: 6 x 2
#   Group1 Group2
#   <chr>  <chr> 
# 1 A      d     
# 2 A      e     
# 3 B      d     
# 4 B      e     
# 5 C      d     
# 6 C      e     

This is the desired output, but as you can imagine, this solution doesn't generalize well: as the number of groups increases, so does the number of unnest operations that we have to perform.

Is there a more elegant solution?

Upvotes: 5

Views: 2989

Answers (4)

Henrik
Henrik

Reputation: 67778

Because OP seems happy to use base, I upgrade my comment to an answer:

expand.grid(split(X$Value, X$Group))
#   Group1 Group2
# 1      A      d
# 2      B      d
# 3      C      d
# 4      A      e
# 5      B      e
# 6      C      e

As noted by OP, expand.grid converts character vectors to factors. To prevent that, use stringsAsFactors = FALSE.

The tidyverse equivalent is purrr::cross_df, which doesn't coerce to factor:

cross_df(split(X$Value, X$Group))
# A tibble: 6 x 2
# Group1 Group2
# <chr>  <chr> 
# 1 A      d     
# 2 B      d     
# 3 C      d     
# 4 A      e     
# 5 B      e     
# 6 C      e

Upvotes: 5

hpesoj626
hpesoj626

Reputation: 3619

This still works with expand after spread.

X %>%
  mutate(id = row_number()) %>%
  spread(Group, Value) %>%
  expand(Group1, Group2) %>%
  na.omit()

Upvotes: 1

pasipasi
pasipasi

Reputation: 1226

I often use tidyr::crossing() to join all values from group2 to group.

data_frame(group = c(LETTERS[1:3])) %>% 
  crossing(group2 = letters[4:5])

I might do something like this:

data %>% 
  distinct(group) %>% 
  crossing(group2)

A more specific example:

dates <- lubridate::make_date(2000:2018)

data_frame(group = letters[1:5]) %>% 
  crossing(dates)

Upvotes: 1

www
www

Reputation: 39154

Here is one option. It will work on the cases with more than two groups although complete_ is deprecated.

library( tidyverse )
X2 <- X %>%
  group_by(Group) %>%
  mutate(ID = 1:n()) %>%
  spread(Group, Value) %>%
  select(-ID) %>%
  complete_(names(.)) %>%
  na.omit()
X2
# # A tibble: 6 x 2
#   Group1 Group2
#   <chr>  <chr> 
# 1 A      d     
# 2 A      e     
# 3 B      d     
# 4 B      e     
# 5 C      d     
# 6 C      e 

Update

!!!syms(names(.)) works well with the regular complete function, thus is better than using complete_ as my original solution.

library( tidyverse )
X2 <- X %>%
  group_by(Group) %>%
  mutate(ID = 1:n()) %>%
  spread(Group, Value) %>%
  select(-ID) %>%
  complete(!!!syms(names(.))) %>%
  na.omit()
X2
# # A tibble: 6 x 2
#   Group1 Group2
#   <chr>  <chr> 
# 1 A      d     
# 2 A      e     
# 3 B      d     
# 4 B      e     
# 5 C      d     
# 6 C      e 

Upvotes: 3

Related Questions