Reputation: 3568
I have gone through the vignette for row-wise operations for the new dplyr v1.0.0
and am intrigued by the possibilities of the nest_by
function for modelling within different silos of a dataset.
However I am having difficulty getting a repeated-measures analysis to work.
Here's an example to illustrate when it does work
df1 <- data.frame(group = factor(rep(LETTERS[1:3],10)),
pred = factor(rep(letters[1:2],each=5,length.out=30)),
out = rnorm(30))
Now create the nesting based on the group
variable.
library(dplyr)
nest1 <- df1 %>% nest_by(group)
nest
We can view this new special nested data frame
# A tibble: 3 x 2
# Rowwise: group
# group data
# <fct> <list<tbl_df[,2]>>
# a [10 x 2]
# b [10 x 2]
# c [10 x 2]
Now we can perform operations on it, like a linear regression, regressing out
on pred
within each level of the original group variable.
mods <- nest1 %>% mutate(mod = list(lm(out ~ pred, data = data)))
In this new object we have added a new column to the original nested dataset containing the lm()
object
mods
# # A tibble: 3 x 3
# # Rowwise: group
# group data mod
# <fct> <list<tbl_df[,2]>> <list>
# 1 A [10 x 2] <lm>
# 2 B [10 x 2] <lm>
# 3 C [10 x 2] <lm>
And we can view the results of these models
library(broom)
mods %>% summarise(broom::tidy(mod))
# A tibble: 6 x 6
# Groups: group [3]
# group term estimate std.error statistic p.value
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 A (Intercept) 0.0684 0.295 0.232 0.823
# 2 A predb -0.231 0.418 -0.553 0.595
# 3 B (Intercept) -0.159 0.447 -0.356 0.731
# 4 B predb 0.332 0.633 0.524 0.615
# 5 C (Intercept) -0.385 0.245 -1.57 0.154
# 6 C predb 0.891 0.346 2.58 0.0329
Now I would like to be able to do the same thing but with a repeated measures t-test.
# dataset with grouping factor and two columns, each representing a measure at one of two timepoints
df2 <- data.frame(group = factor(rep(letters[1:3],10)),
t1 = rnorm(30),
t2 = rnorm(30))
# nest by grouping factor
nest2 <- df2 %>% nest_by(group)
nest2
# A tibble: 3 x 2
# Rowwise: group
# group data
# <fct> <list<tbl_df[,2]>>
# 1 a [10 x 2]
# 2 b [10 x 2]
# 3 c [10 x 2]
Now when I try to perform a paired t-test at each level of the new nested dataset, using a similar procedure to the linear model...
mods2 <- nest2 %>% mutate(t = list(t.test(t1, t2, data = data)))
...I get the following error message
Error: Problem with `mutate()` input `t`.
x object 't1' not found
i Input `t` is `list(t.test(t1, t2, data = data))`.
i The error occured in row 1.
Run `rlang::last_error()` to see where the error occurred.
Can anyone help me?
Upvotes: 3
Views: 125
Reputation: 887213
The data
option is used with the formula
method, while 's3' method with x
, y
as argument, we can wrap using with
library(dplyr)
library(purrr)
nest2 %>%
mutate(t = list(with(data, t.test(t1, t2))))
# A tibble: 3 x 3
# Rowwise: group
# group data t
# <fct> <list<tbl_df[,2]>> <list>
#1 a [10 × 2] <htest>
#2 b [10 × 2] <htest>
#3 c [10 × 2] <htest>
Or use extractors ($
, [[
)
nest2 %>%
mutate(t = list(t.test(data$t1, data$t2)))
Upvotes: 3