Reputation: 954
I want to make a function saving the output in a list. My concern is how to specify which column with dplyr group_by and summaries. Imagine a big dataset. See further comments below.
trt <- rep(LETTERS[1:3],3)
qw <- sample(100,9)
tr <- sample(100,9)
df <- data.frame(trt,qw,tr)
df %>%
group_by(trt) %>%
summarise(mean.mpg = mean(qw, na.rm = TRUE),
sd.mpg = sd(qw, na.rm = TRUE),
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
Why does n[i] not work? How should this be specified to be able to use it in a function?
n <- colnames(df)[2:3]
df %>%
group_by(trt) %>%
summarise(mean.mpg = mean(n[i], na.rm = TRUE),
sd.mpg = sd(n[i], na.rm = TRUE),
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
In the end I want to make this loop and save the output in a list
list_Data <- list()
for (i in 2:ncol(df)){
list_Data[[i]]<- df %>%
group_by(trt) %>%
summarise(mean.mpg = mean(n[i], na.rm = TRUE),
sd.mpg = sd(n[i], na.rm = TRUE),
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
}
Wanted output:
[[1]]
# A tibble: 3 x 7
trt mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
<chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 A 35.7 32.9 3 19.0 -46.0 117.
2 B 46 37.2 3 21.5 -46.5 139.
3 C 64.3 47.8 3 27.6 -54.4 183.
[[2]]
# A tibble: 3 x 7
trt mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
<chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 A 57.7 40.5 3 23.4 -42.8 158.
2 B 49.3 31.0 3 17.9 -27.7 126.
3 C 32.7 34.8 3 20.1 -53.8 119.
Upvotes: 2
Views: 1091
Reputation: 13731
One option is to index the .data
pronoun, which can be done when column names are stored as strings:
for (i in 1:length(n)){ # <-- Note the change from 2:ncol(df)
list_Data[[i]]<- df %>%
group_by(trt) %>%
summarise(mean.mpg = mean(.data[[n[i]]], na.rm = TRUE), # <-- .data pronoun here
sd.mpg = sd(.data[[n[i]]], na.rm = TRUE), # and here
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
}
Upvotes: 1
Reputation: 21938
You have to use sym
function from rlang
package and then unquote it with !!
called bang bang operator. Here because you are providing a string as column name you need to first turn it into a symbol which is in fact pointing to an object in your data set and then you have to tell R to evaluate it in the context of your data set by means of forcing evaluation with !!
operator. You can find out the basics of programming with tidyverse here.
library(rlang)
n <- colnames(df)[2:3]
list_Data <- vector("list", length = ncol(df) - 1)
for (i in 1:(ncol(df)-1)){
list_Data[[i]] <- df %>%
group_by(trt) %>%
summarise(mean.mpg = mean(!!sym(n[i]), na.rm = TRUE),
sd.mpg = sd(!!sym(n[i]), na.rm = TRUE),
n.mpg = n()) %>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
}
> list_Data
[[1]]
# A tibble: 3 x 7
trt mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
<chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 A 62.7 30.6 3 17.7 -13.4 139.
2 B 36.7 14.0 3 8.09 1.86 71.5
3 C 17.7 16.5 3 9.53 -23.3 58.7
[[2]]
# A tibble: 3 x 7
trt mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
<chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 A 49 38.0 3 21.9 -45.4 143.
2 B 36 21.7 3 12.5 -17.8 89.8
3 C 30 26.9 3 15.5 -36.8 96.8
Upvotes: 1