Khanh Nguyen
Khanh Nguyen

Reputation: 27

How to add pooled standard error in tbl_summary() and eta effect size?

I am trying to include Pooled Standard Error (PSE) and Eta square to tbl_summary(). PSE is calculated by sqrt(mean(residuals^2)/n), I tried to calculate step by step by extracting the residuals from either aov() or lm(), but I got the error saying The dimension of respected variable and the added statistic do not match. Expecting statistic/dataframe to be length/ no. rows 1. Here is my code:

PSE <- function(data, variable, by,...) {
    aov(data[["variable"]] ~ as.factor(data[[by]]))$residuals
  }

Dataset_TPA_Full %>%
  select(diet,hardness_g,adhesiveness_g_sec, resilence, cohesion, springiness, gumminess, chewiness, firmness_g_force_1, density_g_l)%>% 
  tbl_summary(
    by = diet,
    statistic = all_continuous() ~ "{mean} ± {sem}",
    label = list(hardness_g = "Hardness (g)", 
                 adhesiveness_g_sec = "Adhesiveness (g/ sec)", 
                 resilence = "Resilience", 
                 cohesion = "Cohesion", 
                 springiness = "Springiness", 
                 gumminess = "Gumminess", 
                 chewiness = "Chewiness", 
                 firmness_g_force_1 = "Firmness (g)", 
                 density_g_l = "Density (g/ L)")
    ) %>% 
    add_p(
      test = all_continuous() ~ "aov",
    ) %>%
  add_stat(fns = all_continuous() ~ PSE) %>% 
  modify_header(label = "**Treatment**", p.value = "**p-value**") %>%
  bold_labels() %>%
  bold_levels()

Also when I tried to add Eta squared using this code, it return missing data argument when I put it in the add_stat() function

my_ES_test <- function(data, variable, by, ...) {
  aovmod = aov(data[[variable]] ~ data[[by]])
  lsr::etaSquared(aovmod)[1,1]
}

Can you help me with this? Thank you.

Upvotes: 1

Views: 536

Answers (1)

DaveArmstrong
DaveArmstrong

Reputation: 21992

This should do it:

library(gtsummary)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

sem <- function(x){
  sqrt(var(x, na.rm=TRUE)/sum(!is.na(x)))
}

PSE <- function(data, variable, by,...) {
  e <- aov(data[[variable]] ~ as.factor(data[[by]]))$residuals
  sqrt(mean(e^2)/length(e))
}

mtcars %>%
  select(cyl, mpg, hp, disp, drat, qsec)%>% 
  tbl_summary(
    by = cyl,
    statistic = all_continuous() ~ "{mean} ± {sem}",
    label = list(mpg = "Miles per Gallon", 
                 hp = "Horsepower", 
                 disp = "Displacement", 
                 drat = "Rear Axel Ratio", 
                 qsec = "1/4 Mile Time")
  ) %>% 
  add_p(
    test = all_continuous() ~ "aov",
  ) %>%
  add_stat(fns = all_continuous() ~ PSE) %>% 
  modify_header(label = "**Treatment**", p.value = "**p-value**", add_stat_1 = "**PSE**") %>%
  bold_labels() %>%
  bold_levels()

enter image description here

Created on 2022-04-17 by the reprex package (v2.0.1)

Note, the PSE() function had two problems. First, the data[["variable"]] should be data[[variable]] (without the quotes around variable). Second, you had the function return the residuals, not the PSE calculation you described in the question. Now, it returns the appropriate result. I also am not sure where you got the sem() function, so I just made one that calculates the standard error of the mean.


Updated PSE function

PSE <- function(data, variable, by,...) {
  s <- data %>% 
    group_by(!!sym(by)) %>% 
    summarise(s = var(!!sym(variable)), 
              n = n()) %>% 
    mutate(num = s*(n-1))
  psd <- sqrt(sum(s$num)/(sum(s$n) - nrow(s)))
  psd*sqrt(sum(1/s$n))
}

Upvotes: 3

Related Questions