llewmills
llewmills

Reputation: 3568

Problem with running paired t-test within nested dplyr dataset

I have gone through the vignette for row-wise operations for the new dplyr v1.0.0 and am intrigued by the possibilities of the nest_by function for modelling within different silos of a dataset.

However I am having difficulty getting a repeated-measures analysis to work.

Here's an example to illustrate when it does work

df1 <- data.frame(group = factor(rep(LETTERS[1:3],10)),
                  pred = factor(rep(letters[1:2],each=5,length.out=30)),
                  out = rnorm(30))

Now create the nesting based on the group variable.

library(dplyr)
nest1 <- df1 %>% nest_by(group)
nest

We can view this new special nested data frame

# A tibble: 3 x 2
# Rowwise:  group
# group               data
# <fct> <list<tbl_df[,2]>>
# a               [10 x 2]
# b               [10 x 2]
# c               [10 x 2]

Now we can perform operations on it, like a linear regression, regressing out on pred within each level of the original group variable.

mods <- nest1 %>% mutate(mod = list(lm(out ~ pred, data = data)))

In this new object we have added a new column to the original nested dataset containing the lm() object

mods

#   # A tibble: 3 x 3
#   # Rowwise:  group
#   group               data mod   
#   <fct> <list<tbl_df[,2]>> <list>
#   1 A               [10 x 2] <lm>  
#   2 B               [10 x 2] <lm>  
#   3 C               [10 x 2] <lm>

And we can view the results of these models

library(broom)
mods %>% summarise(broom::tidy(mod))
#   A tibble: 6 x 6
#   Groups:   group [3]
#   group term        estimate std.error statistic  p.value
#   <fct> <chr>          <dbl>     <dbl>     <dbl>  <dbl>
# 1 A     (Intercept)   0.0684     0.295     0.232  0.823 
# 2 A     predb        -0.231      0.418    -0.553  0.595 
# 3 B     (Intercept)  -0.159      0.447    -0.356  0.731 
# 4 B     predb         0.332      0.633     0.524  0.615 
# 5 C     (Intercept)  -0.385      0.245    -1.57   0.154 
# 6 C     predb         0.891      0.346     2.58   0.0329

Now I would like to be able to do the same thing but with a repeated measures t-test.

# dataset with grouping factor and two columns, each representing a measure at one of two timepoints
df2 <- data.frame(group = factor(rep(letters[1:3],10)),
                  t1 = rnorm(30),
                  t2 = rnorm(30))

# nest by grouping factor
nest2 <- df2 %>% nest_by(group)
nest2

# A tibble: 3 x 2

# Rowwise:  group
# group                 data
# <fct>   <list<tbl_df[,2]>>
# 1 a               [10 x 2]
# 2 b               [10 x 2]
# 3 c               [10 x 2]

Now when I try to perform a paired t-test at each level of the new nested dataset, using a similar procedure to the linear model...

mods2 <- nest2 %>% mutate(t = list(t.test(t1, t2, data = data)))

...I get the following error message

Error: Problem with `mutate()` input `t`.
x object 't1' not found
i Input `t` is `list(t.test(t1, t2, data = data))`.
i The error occured in row 1.
Run `rlang::last_error()` to see where the error occurred.

Can anyone help me?

Upvotes: 3

Views: 125

Answers (1)

akrun
akrun

Reputation: 887213

The data option is used with the formula method, while 's3' method with x, y as argument, we can wrap using with

library(dplyr)
library(purrr)
nest2 %>%
      mutate(t = list(with(data, t.test(t1, t2))))
# A tibble: 3 x 3
# Rowwise:  group
#  group               data t      
#  <fct> <list<tbl_df[,2]>> <list> 
#1 a               [10 × 2] <htest>
#2 b               [10 × 2] <htest>
#3 c               [10 × 2] <htest>     

Or use extractors ($, [[)

nest2 %>% 
    mutate(t = list(t.test(data$t1, data$t2)))

Upvotes: 3

Related Questions