Reputation: 811
I am using nested data frames to nest by certain groups and then run t tests on the factors and values within the $data column. However, for some conditions, I end up not having two factors available in the $data column. Thus, the t test cannot be run and the code will produce an error for the whole data frame. In the example below, groups a - d will have both treatments available for comparisons. However,g roup e will not. how can I specify that the t test only be run on rows where both treatments are available?
set.seed(1)
df <- data.frame(id = paste0('ID-', 1:100),
group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
value = runif(100))
df_analysis <- df %>%
nest(-group) %>%
#How to ask to only run t test on rows that have both treatments in them? As written, it will give an error.
mutate(p = map_dbl(data, ~t.test(value ~ treatment, data=.)$p.value))
Upvotes: 0
Views: 423
Reputation: 23608
Since you are already using some packages of tidyverse you can use some purr functions to capture side effects. In this case you can use possibly
which uses a default value whenever an error occurs.
using your code:
library(dplyr)
library(purrr)
library(tidyr)
set.seed(1)
df <- data_frame(id = paste0('ID-', 1:100),
group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
value = runif(100))
df_analysis <- df %>%
nest(-group) %>%
mutate(p = map_dbl(data, possibly(~t.test(value ~ treatment, data=.)$p.value, NA_real_)))
# A tibble: 5 x 3
group data p
<chr> <list> <dbl>
1 a <tibble [20 x 3]> 0.610
2 b <tibble [20 x 3]> 0.156
3 c <tibble [20 x 3]> 0.840
4 d <tibble [20 x 3]> 0.383
5 e <tibble [20 x 3]> NA
Upvotes: 2
Reputation: 13591
Wrap the t.test(...)
in ifelse()
checking that number of unique items in treatment
is ==2
df %>%
nest(-group) %>%
mutate(p = map_dbl(data, ~ifelse(length(unique(.x$treatment)) == 2, t.test(value ~ treatment, data=.)$p.value, NA)))
# A tibble: 5 x 3
# group data p
# <fct> <list> <dbl>
# 1 a <data.frame [20 x 3]> 0.790
# 2 b <data.frame [20 x 3]> 0.0300
# 3 c <data.frame [20 x 3]> 0.712
# 4 d <data.frame [20 x 3]> 0.662
# 5 e <data.frame [20 x 3]> NA
Upvotes: 1