Reputation: 513
I see this comes up in python, panda, and other areas, but I see no help with dplyr for this problem.
require(dplyr)
set.seed(1234L)
i <- 1:80 # years 1 thru 80
S1 <- .02*i # a trend line
S2 <- S1[i] + rnorm(n=80,mean=0,sd=.2) # same line w/ error added
x <- tibble(S1,
S2,
yrs10 = rep(1:8,each=10), # to aggregate by decades (8 groups)
yrs16 = rep(1:5,each = 16), # by 16 yr spans (5 groups)
yrs20 = rep(1:4,each = 20), # by 20 yr spans (4 groups)
yrs40 = rep(1:2,each = 40) # by 40 yr spans (2 groups)
)
This is my problem: I cannot find a way to feed repeated aggregations into the group_by statement without declaring 4 different vars and repeating all the code.
x10 <- x %>%
group_by(yrs10) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x16 <- x %>%
group_by(yrs16) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x20 <- x %>%
group_by(yrs20) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x40 <-x %>%
group_by(yrs40) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x10
x16
x20
x40
There should be a way to feed yrs10 thru yrs40 as a list and collect these results as a list but all I get are errors when I try. The dplyr answers I see here don't seem to cover this scenario. What am I missing?
Upvotes: 0
Views: 175
Reputation: 16876
Another option if you are wanting to return a list is to use map
, where we can feed the column names into the group_by
in the function.
library(tidyverse)
map(names(x[-c(1:2)]), ~
x %>%
group_by(.data[[.x]]) %>%
summarize(cor = cor(S1, S2) ^ 2))
Output
[[1]]
# A tibble: 8 × 2
yrs10 cor
<int> <dbl>
1 1 0.0378
2 2 0.364
3 3 0.0371
4 4 0.00845
5 5 0.00178
6 6 0.420
7 7 0.0386
8 8 0.0281
[[2]]
# A tibble: 5 × 2
yrs16 cor
<int> <dbl>
1 1 0.294
2 2 0.181
3 3 0.317
4 4 0.467
5 5 0.0289
[[3]]
# A tibble: 4 × 2
yrs20 cor
<int> <dbl>
1 1 0.386
2 2 0.135
3 3 0.403
4 4 0.0556
[[4]]
# A tibble: 2 × 2
yrs40 cor
<int> <dbl>
1 1 0.554
2 2 0.660
Additionally, if you did need to get them into the same dataframe, then we could make some adjustments in the function in map
:
map(names(x[-c(1:2)]), ~
x %>%
group_by(.data[[.x]]) %>%
summarize(cor = cor(S1, S2) ^ 2) %>%
mutate(name = colnames(.)[1]) %>%
rename("value" = 1)) %>%
bind_rows()
Output
value cor name
<int> <dbl> <chr>
1 1 0.0378 yrs10
2 2 0.364 yrs10
3 3 0.0371 yrs10
4 4 0.00845 yrs10
5 5 0.00178 yrs10
6 6 0.420 yrs10
7 7 0.0386 yrs10
8 8 0.0281 yrs10
9 1 0.294 yrs16
10 2 0.181 yrs16
11 3 0.317 yrs16
12 4 0.467 yrs16
13 5 0.0289 yrs16
14 1 0.386 yrs20
15 2 0.135 yrs20
16 3 0.403 yrs20
17 4 0.0556 yrs20
18 1 0.554 yrs40
19 2 0.660 yrs40
Upvotes: 2
Reputation: 24845
You can pivot longer, and do on each group
x %>% pivot_longer(cols = starts_with("yrs")) %>%
group_by(name,value) %>%
summarize(cor = cor(S1,S2)^2)
Output:
name value cor
<chr> <int> <dbl>
1 yrs10 1 0.0378
2 yrs10 2 0.364
3 yrs10 3 0.0371
4 yrs10 4 0.00845
5 yrs10 5 0.00178
6 yrs10 6 0.420
7 yrs10 7 0.0386
8 yrs10 8 0.0281
9 yrs16 1 0.294
10 yrs16 2 0.181
11 yrs16 3 0.317
12 yrs16 4 0.467
13 yrs16 5 0.0289
14 yrs20 1 0.386
15 yrs20 2 0.135
16 yrs20 3 0.403
17 yrs20 4 0.0556
18 yrs40 1 0.554
19 yrs40 2 0.660
and, if you want it back in wide format, simply add:
... %>%
pivot_wider(id_cols=value, names_from=name, values_from = cor)
Output:
value yrs10 yrs16 yrs20 yrs40
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0.0378 0.294 0.386 0.554
2 2 0.364 0.181 0.135 0.660
3 3 0.0371 0.317 0.403 NA
4 4 0.00845 0.467 0.0556 NA
5 5 0.00178 0.0289 NA NA
6 6 0.420 NA NA NA
7 7 0.0386 NA NA NA
8 8 0.0281 NA NA NA
Finally, could also do something like this, if you wanted each one in a list
f <- function(d,g) {
d %>% group_by(across(all_of(g))) %>% summarize(cor=cor(S1,S2)^2)
}
lapply(colnames(x)[3:6], function(n) f(x,n))
Output:
[[1]]
# A tibble: 8 x 2
yrs10 cor
<int> <dbl>
1 1 0.0378
2 2 0.364
3 3 0.0371
4 4 0.00845
5 5 0.00178
6 6 0.420
7 7 0.0386
8 8 0.0281
[[2]]
# A tibble: 5 x 2
yrs16 cor
<int> <dbl>
1 1 0.294
2 2 0.181
3 3 0.317
4 4 0.467
5 5 0.0289
[[3]]
# A tibble: 4 x 2
yrs20 cor
<int> <dbl>
1 1 0.386
2 2 0.135
3 3 0.403
4 4 0.0556
[[4]]
# A tibble: 2 x 2
yrs40 cor
<int> <dbl>
1 1 0.554
2 2 0.660
Upvotes: 2