Reputation: 567

R tidyverse mutate with all combinations of subset of columns in grouped dataframe

I'd like to run a mutate for all combinations of a subset of columns of a grouped dataframe via tidyverse.

Here's the setup:

library(tidyverse)
a <- c(2018, 2019, 2020) #years
b <- c(23, 24, 25, 26) #cities
c <- c(45, 46, 47, 48, 49) #jobs
rows <- expand.grid(a, b, c) 
colnames(rows) <- c('a', 'b', 'c')
d <-  sample(c(0,1), nrow(rows), replace = TRUE)
obs <- cbind(rows, d) %>% group_by(a, b) %>% mutate(c = str_c("J", c))
wide <- obs %>% spread(c, d)

Which produces this:

A grouped_df: 12 × 7
a       b   J45 J46 J47 J48 J49
2018    23  1   0   1   1   1
2018    24  0   0   1   1   0
2018    25  0   1   0   1   1
2018    26  1   0   1   1   1
2019    23  0   0   1   1   1
2019    24  0   1   0   0   1
2019    25  1   0   0   1   0
2019    26  0   1   1   1   1
2020    23  0   1   0   1   0
2020    24  1   0   1   0   1
2020    25  1   0   0   1   0
2020    26  0   1   0   0   0

The dataframe data is grouped by a and b, and I want to find the sum of all combinations of the J* columns.

I've been referring to this [tidyverse: Chi Square for all combinations of columns for guidance, but it doesn't use grouping.

I've expanded the dataframe to include all combinations with

combos <- data.frame(t(combn(unique(obs$c), 2)))
data <- merge(wide, combos) %>% mutate(X1 = as.character(X1), X2 = as.character(X2))

and head(data, 10) produces this:

    a       b   J45 J46 J47 J48 J49 X1  X2
1   2018    23  1   0   1   1   1   J45 J46
2   2018    24  0   0   1   1   0   J45 J46
3   2018    25  0   1   0   1   1   J45 J46
4   2018    26  1   0   1   1   1   J45 J46
5   2019    23  0   0   1   1   1   J45 J46
6   2019    24  0   1   0   0   1   J45 J46
7   2019    25  1   0   0   1   0   J45 J46
8   2019    26  0   1   1   1   1   J45 J46
9   2020    23  0   1   0   1   0   J45 J46
10  2020    24  1   0   1   0   1   J45 J46

Now I want the sum for each row of the value in the X1 column and the value in the X2 column. I can't seem to make either map2 or unquoted eval work. What am I missing?

Thank you for your attention.

EDIT: Some additional info, results of what I've tried and desired output (@camille).

With data %>% mutate(d = map2(X1, X2, sum(.x, .y))) I get

Error in as_mapper(.f, ...): object '.x' not found

and with data %>% mutate(d = sum(!!sym(X1), !!sym(X2))) I get

Error in is_symbol(x): object 'X1' not found.

I don't understand why it can't find these elements, esp in the map2 case.

As output, I need to retain the grouping values and column combinations with the sum value (ultimately a weighted edge list). Both outputs from the responses by @andrew_reece and @42- are fine, but I am curious as to why something more straightforward along the lines of "add the values in these two columns row wise" doesn't work.

Upvotes: 0

Answers (2)

IRTFM

Reputation: 263481

I'm guessing you want something like this:

data$sum2s <- rowSums( cbind(   #make one two-column matrix
                             # by indexing with two other two-column matrices below
               as.numeric( data[cbind( 1:nrow(data), match(data$X1,names(data)))]), 
               as.numeric( data[cbind( 1:nrow(data), match(data$X2,names(data)))])
                       ) ) 


> str(data)
'data.frame':   120 obs. of  10 variables:
 $ a    : num  2018 2018 2018 2018 2019 ...
 $ b    : num  23 24 25 26 23 24 25 26 23 24 ...
 $ J45  : num  1 1 1 1 1 0 1 1 1 1 ...
 $ J46  : num  0 1 0 0 0 0 0 1 1 1 ...
 $ J47  : num  1 1 1 0 0 1 1 0 0 1 ...
 $ J48  : num  0 1 0 0 0 1 0 0 1 1 ...
 $ J49  : num  1 1 0 1 0 1 1 0 0 1 ...
 $ X1   : chr  "J45" "J45" "J45" "J45" ...
 $ X2   : chr  "J46" "J46" "J46" "J46" ...
 $ sum2s: num  1 2 1 1 1 0 1 2 2 2 ...

It uses indexing with a two-column matrix, which allows you to pull out non-contiguous values, instead of needing to have all values from the same row or column. See?"["

Upvotes: 0

andrew_reece

Reputation: 21284

Here's one approach: pivot_longer to filter down to only the combo values you want, then group_by and sum:

data %>%
  pivot_longer(starts_with("J"), names_to = "var", values_to = "val") %>%
  filter(var == X1 | var == X2) %>% 
  group_by(a, b, X1, X2) %>%
  summarise(combo_val = sum(val))

You could also pivot_wider after filtering, and get combo sums via rowSums:

data %>%
  pivot_longer(starts_with("J"), names_to = "var", values_to = "val") %>%
  filter(var == X1 | var == X2) %>% 
  pivot_wider (id_cols = c(a, b, X1, X2), 
               names_from = "var", 
               values_from = "val", 
               values_fill = list(val = 0)) %>% 
  mutate(combo_val = rowSums(select(., starts_with("J")))) %>%
  select(a, b, X1, X2, combo_val)

In either case, the output is the same:

# A tibble: 120 x 5
       a     b X1    X2    combo_val
   <dbl> <dbl> <chr> <chr>     <dbl>
 1  2018    23 J45   J46           1
 2  2018    24 J45   J46           2
 3  2018    25 J45   J46           0
 4  2018    26 J45   J46           2
 5  2019    23 J45   J46           2
 6  2019    24 J45   J46           0
 7  2019    25 J45   J46           0
 8  2019    26 J45   J46           0
 9  2020    23 J45   J46           0
10  2020    24 J45   J46           2
# … with 110 more rows

Upvotes: 1

R tidyverse mutate with all combinations of subset of columns in grouped dataframe

Answers (2)

Related Questions