user42485
user42485

Reputation: 811

Within nested data frame, filter rows that contain specific strings

I am using nested data frames to nest by certain groups and then run t tests on the factors and values within the $data column. However, for some conditions, I end up not having two factors available in the $data column. Thus, the t test cannot be run and the code will produce an error for the whole data frame. In the example below, groups a - d will have both treatments available for comparisons. However,g roup e will not. how can I specify that the t test only be run on rows where both treatments are available?

set.seed(1)
df <- data.frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis <- df %>% 
  nest(-group) %>% 
  #How to ask to only run t test on rows that have both treatments in them? As written, it will give an error.
  mutate(p = map_dbl(data, ~t.test(value ~ treatment, data=.)$p.value))

Upvotes: 0

Views: 423

Answers (2)

phiver
phiver

Reputation: 23608

Since you are already using some packages of tidyverse you can use some purr functions to capture side effects. In this case you can use possibly which uses a default value whenever an error occurs.

using your code:

library(dplyr)
library(purrr)
library(tidyr)

set.seed(1)
df <- data_frame(id = paste0('ID-', 1:100),
                 group = rep(c('a', 'b', 'c', 'd', 'e'), each = 20),
                 treatment = c(rep(c('x', 'y'), 40), rep('x', 20)),
                 value = runif(100))

df_analysis  <- df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, possibly(~t.test(value ~ treatment, data=.)$p.value, NA_real_)))

# A tibble: 5 x 3
  group data                   p
  <chr> <list>             <dbl>
1 a     <tibble [20 x 3]>  0.610
2 b     <tibble [20 x 3]>  0.156
3 c     <tibble [20 x 3]>  0.840
4 d     <tibble [20 x 3]>  0.383
5 e     <tibble [20 x 3]> NA    

Upvotes: 2

CPak
CPak

Reputation: 13591

Wrap the t.test(...) in ifelse() checking that number of unique items in treatment is ==2

df %>% 
  nest(-group) %>% 
  mutate(p = map_dbl(data, ~ifelse(length(unique(.x$treatment)) == 2, t.test(value ~ treatment, data=.)$p.value, NA)))

# A tibble: 5 x 3
  # group data                        p
  # <fct> <list>                  <dbl>
# 1 a     <data.frame [20 x 3]>  0.790 
# 2 b     <data.frame [20 x 3]>  0.0300
# 3 c     <data.frame [20 x 3]>  0.712 
# 4 d     <data.frame [20 x 3]>  0.662 
# 5 e     <data.frame [20 x 3]> NA 

Upvotes: 1

Related Questions