Reputation: 317
I am making linear models across a large dataset which is unbalanced (not all contrasts are present for all groupings). Is there an efficient way to ignore groupings where there are less than 2 contrasts? In the examples below testData1
represents a balanced dataset where the workflow works correctly. testData2
represents an unbalanced dataset which throws a contrast error.
aovFxn <- function(dat){
lm(outcomeVar ~ predVar1, data = dat) %>%
broom::tidy()
}
testData1 <- data.frame(
groupVar = rep(c('a', 'b'), each = 12),
predVar1 = c(rep(c('x', 'y', 'z'), each = 4, times = 2)),
outcomeVar = sample(1:100, 24)
)
testData2 <- data.frame(
groupVar = rep(c('a', 'b'), each = 12),
predVar1 = c(rep(c('x', 'y', 'z'), each = 4),
rep('x', 12)),
outcomeVar = sample(1:100, 24)
)
testStats1 <- testData1 %>%
nest(groupData = -groupVar) %>%
mutate(df = purrr::map(groupData, aovFxn)) %>%
unnest_legacy(df)
testStats2 <- testData2 %>%
nest(groupData = -groupVar) %>%
mutate(df = purrr::map(groupData, aovFxn)) %>%
unnest_legacy(df)
Upvotes: 1
Views: 82
Reputation: 887118
We may use either tryCatch
or purrr::possibly
to return a desired value when there is an error
library(dplyr)
library(purrr)
paovFxn <- possibly(aovFxn, otherwise = NULL)
testData2 %>%
nest(groupData = -groupVar) %>%
mutate(df = purrr::map(groupData, paovFxn)) %>%
unnest(df)%>%
select(-groupData)
-output
A tibble: 3 × 6
groupVar term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a (Intercept) 42.5 17.3 2.45 0.0367
2 a predVar1y 19.7 24.5 0.805 0.441
3 a predVar1z 2.25 24.5 0.0917 0.929
Another option is to create an if
condition
testData2 %>%
nest(groupData = -groupVar) %>%
mutate(df = map(groupData, ~ if(n_distinct(.x$predVar1) > 1) aovFxn(.x)) ) %>%
unnest(df, keep_empty = TRUE) %>%
select(-groupData)
-output
# A tibble: 4 × 6
groupVar term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a (Intercept) 42.5 17.3 2.45 0.0367
2 a predVar1y 19.7 24.5 0.805 0.441
3 a predVar1z 2.25 24.5 0.0917 0.929
4 b <NA> NA NA NA NA
NOTE: If we don't use keep_empty = TRUE
, it will be FALSE
by default and the 'groupVar' 'b' row will not be there in the output
Upvotes: 1