Reputation: 567
I'd like to run a mutate for all combinations of a subset of columns of a grouped dataframe via tidyverse.
Here's the setup:
library(tidyverse)
a <- c(2018, 2019, 2020) #years
b <- c(23, 24, 25, 26) #cities
c <- c(45, 46, 47, 48, 49) #jobs
rows <- expand.grid(a, b, c)
colnames(rows) <- c('a', 'b', 'c')
d <- sample(c(0,1), nrow(rows), replace = TRUE)
obs <- cbind(rows, d) %>% group_by(a, b) %>% mutate(c = str_c("J", c))
wide <- obs %>% spread(c, d)
Which produces this:
A grouped_df: 12 × 7
a b J45 J46 J47 J48 J49
2018 23 1 0 1 1 1
2018 24 0 0 1 1 0
2018 25 0 1 0 1 1
2018 26 1 0 1 1 1
2019 23 0 0 1 1 1
2019 24 0 1 0 0 1
2019 25 1 0 0 1 0
2019 26 0 1 1 1 1
2020 23 0 1 0 1 0
2020 24 1 0 1 0 1
2020 25 1 0 0 1 0
2020 26 0 1 0 0 0
The dataframe data
is grouped by a
and b
, and I want to find the sum of all combinations of the J*
columns.
I've been referring to this [tidyverse: Chi Square for all combinations of columns for guidance, but it doesn't use grouping.
I've expanded the dataframe to include all combinations with
combos <- data.frame(t(combn(unique(obs$c), 2)))
data <- merge(wide, combos) %>% mutate(X1 = as.character(X1), X2 = as.character(X2))
and head(data, 10) produces this:
a b J45 J46 J47 J48 J49 X1 X2
1 2018 23 1 0 1 1 1 J45 J46
2 2018 24 0 0 1 1 0 J45 J46
3 2018 25 0 1 0 1 1 J45 J46
4 2018 26 1 0 1 1 1 J45 J46
5 2019 23 0 0 1 1 1 J45 J46
6 2019 24 0 1 0 0 1 J45 J46
7 2019 25 1 0 0 1 0 J45 J46
8 2019 26 0 1 1 1 1 J45 J46
9 2020 23 0 1 0 1 0 J45 J46
10 2020 24 1 0 1 0 1 J45 J46
Now I want the sum for each row of the value in the X1 column and the value in the X2 column. I can't seem to make either map2
or unquoted eval work. What am I missing?
Thank you for your attention.
EDIT: Some additional info, results of what I've tried and desired output (@camille).
With data %>% mutate(d = map2(X1, X2, sum(.x, .y)))
I get
Error in as_mapper(.f, ...): object '.x' not found
and with data %>% mutate(d = sum(!!sym(X1), !!sym(X2)))
I get
Error in is_symbol(x): object 'X1' not found.
I don't understand why it can't find these elements, esp in the map2
case.
As output, I need to retain the grouping values and column combinations with the sum value (ultimately a weighted edge list). Both outputs from the responses by @andrew_reece and @42- are fine, but I am curious as to why something more straightforward along the lines of "add the values in these two columns row wise" doesn't work.
Upvotes: 0
Views: 296
Reputation: 263342
I'm guessing you want something like this:
data$sum2s <- rowSums( cbind( #make one two-column matrix
# by indexing with two other two-column matrices below
as.numeric( data[cbind( 1:nrow(data), match(data$X1,names(data)))]),
as.numeric( data[cbind( 1:nrow(data), match(data$X2,names(data)))])
) )
> str(data)
'data.frame': 120 obs. of 10 variables:
$ a : num 2018 2018 2018 2018 2019 ...
$ b : num 23 24 25 26 23 24 25 26 23 24 ...
$ J45 : num 1 1 1 1 1 0 1 1 1 1 ...
$ J46 : num 0 1 0 0 0 0 0 1 1 1 ...
$ J47 : num 1 1 1 0 0 1 1 0 0 1 ...
$ J48 : num 0 1 0 0 0 1 0 0 1 1 ...
$ J49 : num 1 1 0 1 0 1 1 0 0 1 ...
$ X1 : chr "J45" "J45" "J45" "J45" ...
$ X2 : chr "J46" "J46" "J46" "J46" ...
$ sum2s: num 1 2 1 1 1 0 1 2 2 2 ...
It uses indexing with a two-column matrix, which allows you to pull out non-contiguous values, instead of needing to have all values from the same row or column. See?"["
Upvotes: 0
Reputation: 21264
Here's one approach: pivot_longer
to filter
down to only the combo values you want, then group_by
and sum
:
data %>%
pivot_longer(starts_with("J"), names_to = "var", values_to = "val") %>%
filter(var == X1 | var == X2) %>%
group_by(a, b, X1, X2) %>%
summarise(combo_val = sum(val))
You could also pivot_wider
after filtering, and get combo sums via rowSums
:
data %>%
pivot_longer(starts_with("J"), names_to = "var", values_to = "val") %>%
filter(var == X1 | var == X2) %>%
pivot_wider (id_cols = c(a, b, X1, X2),
names_from = "var",
values_from = "val",
values_fill = list(val = 0)) %>%
mutate(combo_val = rowSums(select(., starts_with("J")))) %>%
select(a, b, X1, X2, combo_val)
In either case, the output is the same:
# A tibble: 120 x 5
a b X1 X2 combo_val
<dbl> <dbl> <chr> <chr> <dbl>
1 2018 23 J45 J46 1
2 2018 24 J45 J46 2
3 2018 25 J45 J46 0
4 2018 26 J45 J46 2
5 2019 23 J45 J46 2
6 2019 24 J45 J46 0
7 2019 25 J45 J46 0
8 2019 26 J45 J46 0
9 2020 23 J45 J46 0
10 2020 24 J45 J46 2
# … with 110 more rows
Upvotes: 1