Reputation: 393
I've got a dataset with four levels: observations (i.e. a period in which a teacher is observed), teachers, schools, school divisions. Observations are nested within teachers, who are nested within schools, etc.
Each row in the data corresponds to an instance a teacher is observed.
At each level of the hierarchy, I want to compute the mean
, sd
, min
, and max
for each of several variables (x1
, x2
, and x3
in the simulated data, but there are ~12 in the actual data). And I want all of these summaries in a single dataframe.
The code below will do it, but it feels clunky to me. More specifically, a few things bothering me are:
rename
within the function I wrote using the group_var
value, so I resorted to manually doing this outside of the functions.left_join
to join them together at the end (again manually).purrr
to "peel back" layers of hierarchy and aggregate, but it's eluding me.Any advice on how to streamline this, and particularly how to pass the group_var
values to rename_at
, would be much appreciated!
library(tidyverse)
library(treemap)
df <- random.hierarchical.data(n = 200, depth = 4) %>%
rename(div = index1,
sch = index2,
teacher = index3,
obs = index4,
x1 = x) %>%
mutate(x2 = rlnorm(200),
x3 = rlnorm(200))
sum_func <- function(data, sum_vars, ...) {
group_vars <- enquos(...)
data %>%
group_by(!!!group_vars) %>%
summarize_at(vars(sum_vars),
list(
~mean(., na.rm = TRUE),
~sd(., na.rm = TRUE),
~min(., na.rm = TRUE),
~max(., na.rm = TRUE)
)) %>%
ungroup()
}
use_vars <- c("x1", "x2", "x3")
teacher_sum <- sum_func(data = df, sum_vars = use_vars, div, sch, teacher) %>%
rename_at(vars(-c("teacher", "sch", "div")), ~str_replace_all(., "^", "teacher_"))
sch_sum <- sum_func(df, sum_vars = use_vars, div, sch) %>%
rename_at(vars(-c("sch", "div")), ~str_replace_all(., "^", "sch_"))
div_sum <- sum_func(df, sum_vars = use_vars, div) %>%
rename_at(vars(-c("div")), ~str_replace_all(., "^", "div_"))
full <- teacher_sum %>%
left_join(sch_sum, by = c("sch", "div")) %>%
left_join(div_sum, by = "div")
Upvotes: 1
Views: 366
Reputation: 6954
You have been quite close. The code below works yet I am unsure how to automate the joining completely since the logic is not clear to me
sum_func <- function(data, sum_vars, replacement, ...) {
group_vars <- enquos(...)
data %>%
group_by(!!!group_vars) %>%
summarize_at(vars(sum_vars),
list(
~mean(., na.rm = TRUE),
~sd(., na.rm = TRUE),
~min(., na.rm = TRUE),
~max(., na.rm = TRUE)
)) %>%
ungroup() %>%
rename_at(vars(-c(!!!group_vars)),
~str_replace_all(., "^", replacement))
}
use_vars <- c("x1", "x2", "x3")
teacher_sum <- sum_func(data = df,
sum_vars = use_vars,
replacement = "teacher_",
div, sch, teacher)
sch_sum <- sum_func(data = df,
sum_vars = use_vars,
replacement = "sch_",
div, sch)
div_sum <- sum_func(df,
sum_vars = use_vars,
replacement = "div_",
div)
Upvotes: 2