Em Laskey
Em Laskey

Reputation: 607

Extract NA Values in Summary Output of glm

I am running a logistic regression on many different subsets of a large data frame. To do so, I use the follwoing code (using dplyrand purrr):

# define model to be run
mod_fun <- function(df) {
  glm(presence ~ transect, data = df, family = "binomial")
}

# nest data and run model
mod.glm <- dat %>%
  nest(-c(region, fYear, species, road)) %>%
  mutate(model = map(data, mod_fun))

# define functions to extract model coefficients
b_fun <- function(mod) {
  coef(summary(mod))[2]
}
p_fun <- function(mod) {
  coef(summary(mod))[8]
}

# extract coefficients
slope<-mod.glm %>% group_by(species, region, fYear, road) %>%
  transmute(beta = map_dbl(model, b_fun),
            p_val = map_dbl(model, p_fun))

As you can see, I only want to extract the estimate and p-value of the slope (called transect). To do so, I use the indexing coef(summary(mod))[2]etc. The problem is that there are also some subsets in my data frame resulting in over-determined systems where some coefficients will be set to NA. Using coef(summary(mod))[2] extracts the 2nd value of the coef()output and as NAs are ignored in coef(), this will no longer be the estimate of transect I want to extract. So far I tried unsing coef(summary(mod_2), complete = TRUE)(--> nothing changes, NAs still not showed) and adressing the values directly coef(summary(mod_2), complete = TRUE)["transect","Estimate"](--> throws an error). Does anyone know how I could solve this issue?

What I tried so far:

# two example models; mod_2 will result in NAs
mod_1 <- glm(presence ~ transect, data = dat[dat$fYear == 1&  dat$species=="Plantago lanceolata",], family = "binomial")
mod_2 <- glm(presence ~ transect, data = dat[dat$fYear == 2&  dat$species=="Plantago lanceolata",], family = "binomial")

coef(summary(mod_1))[2] # works fine
coef(summary(mod_2))[2] # not the value I want

coef(summary(mod_1), complete = TRUE)["transect","Estimate"] # works fine
coef(summary(mod_2), complete = TRUE)["transect","Estimate"] # error

coef(summary(mod_2), complete = TRUE) # NAs for transect are still not displayed

summary(mod_2)$coefficients["transect","Estimate"] # is not working either

Data:

dput(dat)
structure(list(region = c("HWI", "HWI", "HWI", "HWI", "HWI", 
"HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", 
"HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", "HWI", 
"HWI", "HWI", "HWI", "HWI", "HWI", "HWI"), road = c("MK", "MK", 
"MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", 
"MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", "MK", 
"MK", "MK", "MK", "MK", "MK"), transect = c(1L, 1L, 2L, 2L, 3L, 
3L, 4L, 4L, 4L, 4L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 
11L, 11L, 12L, 12L, 13L, 13L, 15L, 15L, 1L), fYear = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), species = c("Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata", 
"Poa pratensis", "Plantago lanceolata", "Poa pratensis", "Plantago lanceolata"
), presence = c(1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -29L), groups = structure(list(
    fYear = c(1L, 1L, 2L), road = c("MK", "MK", "MK"), species = c("Plantago lanceolata", 
    "Poa pratensis", "Plantago lanceolata"), .rows = list(c(1L, 
    3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L
    ), c(2L, 4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L, 22L, 24L, 
    26L, 28L), 29L)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

Thanks for your help!

Upvotes: 0

Views: 396

Answers (2)

aosmith
aosmith

Reputation: 36086

I don't see how to extract the NA rows from the coefficients table. The answer, instead, may be to add in the NA rows after extracting the elements you want. This can be done with complete().

In this example I use broom::tidy() because it makes it easy to get coefficients, a measure of uncertainty, and results of tests. This means I have to filter out the intercepts after the fact, but you could certainly do something similar with your functions if you add a column to complete() on.

library(purrr)
library(tidyr)
library(dplyr)

mod.glm %>%
     group_by(species, region, fYear, road) %>%
     transmute(results = map(model, broom::tidy) ) %>%
     unnest(results) %>%
     complete(term = "transect") %>%
     filter(term != "(Intercept)") %>%
     ungroup()

# A tibble: 3 x 9
  species          region fYear road  term    estimate std.error statistic p.value
  <chr>            <chr>  <int> <chr> <chr>      <dbl>     <dbl>     <dbl>   <dbl>
1 Plantago lanceo~ HWI        1 MK    transe~  -42.7   31127.     -0.00137   0.999
2 Plantago lanceo~ HWI        2 MK    transe~   NA        NA      NA        NA    
3 Poa pratensis    HWI        1 MK    transe~   -0.206     0.188  -1.10      0.272

Upvotes: 4

aosmith
aosmith

Reputation: 36086

Going a completely different route, you could change your extraction functions to return NA when they give errors. This is a job for functions like tryCatch(), but I've found possibly() from purrr to be pretty handy for this sort of task.

possibly() wraps around a function. The otherwise arguments says what value to return if an error occurs when using the function.

Here are your two functions, wrapped in possibly(). I've changed them to specifically be working with the "transect" row of the coefficients summary so this will error if that doesn't exist.

b_fun <- possibly(
     function(mod) {
          coef(summary(mod))["transect", 1]
          }, otherwise = NA)

p_fun <- possibly(
     function(mod) {
          coef(summary(mod))["transect", 4]
          }, otherwise = NA)

# extract coefficients
mod.glm %>%
     group_by(species, region, fYear, road) %>%
     transmute(beta = map_dbl(model, b_fun),
               p_val = map_dbl(model, p_fun) ) %>%
     ungroup() 

# A tibble: 3 x 6
  species             region fYear road     beta  p_val
  <chr>               <chr>  <int> <chr>   <dbl>  <dbl>
1 Plantago lanceolata HWI        1 MK    -42.7    0.999
2 Poa pratensis       HWI        1 MK     -0.206  0.272
3 Plantago lanceolata HWI        2 MK     NA     NA    

Upvotes: 4

Related Questions