calculate summary statistics with repeated measures / long data in R

Question

Apologies if this has been asked elsewhere / if I am using the wrong terms, I have been trying to search for the correct way to do this but with no success so far.

I have an experimental design with 3 experimental conditions using repeated measures outcomes (each participant completes 4 trials). The data I have currently is in long format (each participant ID is repeated 4 times). I am trying to calculate summary statistics for the demographic variables (age, gender, condition etc.) but I cannot figure out how to, for lack of a better word, collapse/merge the rows for each participant together to get the frequency data and/or summary stats.

Below I have a simulated dataset

require(tidyverse)
require(summarytools)
require(skimr)
require(lme4)
require(wakefield) #to simulate age distribution
require(reshape2)

id <- rep(1:150, each = 4)
age <- rep(age(150, x = 18:21), each = 4)
gender <- rep(c("male", "male", "male", "male", "female", "female","female","female"), each = 25, times = 3)
condition <- rep(c("condition_1", "condition_2", "condition_3"), each = 4, times = 50) #condition
control_1 <- rep(c("order_1", "order_2"), each = 4, length.out = 600) # control variable for counterbalancing
control_2 <- rep(c("group_1", "group_2"), each = 75, length.out = 600) control variable for counterbalancing
test1_trial <- rep(c("trial_1", "trial_2", "trial_3", "trial_4"), each = 1, length.out = 600)
test1_outcome <- rbinom(600, 1, 0.5) # actual data
test2_trial <- rep(c("trial_1", "trial_2", "trial_3", "trial_4"), each = 1, length.out = 600)
test2_outcome <- rbinom(600, 1, 0.5) # actual data

dat <- data.frame(id, age, gender, condition, control_1, control_2, test1_trial, test1_outcome, test2_trial, test2_outcome)

I have tried using group_by like so

dat %>% 
  group_by(id) %>% 
  freq(age)

but this gives me each id as a separate group which is obviously not helpful for summary statistics.

I also tried using summarise_all but could not get it to work

dat$id <- as.factor(dat$id)

dat %>% 
  select(id, age)
  group_by(id) %>% 
  summarise_all(funs(sum))

Error in UseMethod("group_by") : no applicable method for 'group_by' applied to an object of class "c('integer', 'numeric')"

For the summary statistics, I don't care about the actual data (i.e. test1_outcome and test2_outcome), I just want to be able to calculate e.g., the mean age, number of participants per condition etc. Is there a way I can somehow select just the control/demographic variables and collapse them for each participant?

Apologies for the basic question, I do not usually work with repeated measures designs and so am not super familiar with long format data.

knawhatimean · Accepted Answer

If your demographic data don't vary across treatment rounds, you can just run distinct() or unique() by id, similar to what Jon Spring suggested, like this:

dat %>% 
distinct(id, age, gender)

You could then collapse by condition to get the summary stats by this or whatever other variable you want along with the count of participants:

dat %>% 
distinct(id, age, gender, condition) %>% 
group_by(condition, gender) %>% 
mutate(n = n()) %>% 
summarise_all( .funs = c(mean))

calculate summary statistics with repeated measures / long data in R

Answers (1)

Related Questions