Adding an column for the category of glm coeffients in broom results

Question

Is there any way to add a column to the result of the broom package's tidy function that can act relate the term column back to both the original names used in the formula argument and their columns in the data argument.

For example if I run the following I get:

library(ggplot2)
library(dplyr)

mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars)

tidy(mod)

#               term     estimate std.error   statistic      p.value
# 1      (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02
# 2               wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3             qsec  0.843355538 0.3930252  2.14580532 4.221188e-02
# 4 as.factor(carb)2  0.004133826 1.5321134  0.00269812 9.978695e-01
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01

What I am looking for is something like this:

#               term     estimate std.error   statistic      p.value   term_base
# 1      (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02 
# 2               wt -4.916303175 0.6747590 -7.28601380 1.584408e-07          wt
# 3             qsec  0.843355538 0.3930252  2.14580532 4.221188e-02        qsec
# 4 as.factor(carb)2  0.004133826 1.5321134  0.00269812 9.978695e-01        carb
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01        carb
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01        carb
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01        carb
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01        carb

Not so bothered if the first row in this new column is empty, Intercept or 1. Just need something that can match the term column to the original variable names passed to the formula?

Edit

Would be good if it didn't depend on using as.factor in the formula, e.g. would work on:

mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb)))

tidy(mod)

#          term     estimate std.error   statistic      p.value
# 1 (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02
# 2          wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3        qsec  0.843355538 0.3930252  2.14580532 4.221188e-02
# 4       carb2  0.004133826 1.5321134  0.00269812 9.978695e-01
# 5       carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6       carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7       carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8       carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01

akrun · Accepted Answer

We can use regex to create the 'term_base' column

tidy(mod) %>%
        mutate(term_base = sub("Intercept", "", gsub(".*$|$.*", "", term)))
#              term     estimate std.error   statistic      p.value term_base
#1      (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02          
#2               wt -4.916303175 0.6747590 -7.28601380 1.584408e-07        wt
#3             qsec  0.843355538 0.3930252  2.14580532 4.221188e-02      qsec
#4 as.factor(carb)2  0.004133826 1.5321134  0.00269812 9.978695e-01      carb
#5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01      carb
#6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01      carb
#7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01      carb
#8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01      carb

The as.factor can be removed from the 'term' as well if we mutate the 'carb' to factor before the glm step

mtcars %>%
     mutate(carb = factor(carb)) %>% 
     glm(formula = mpg ~wt + qsec + carb, data = .) %>% 
     tidy(.) %>%
     mutate(term_base = sub("$.*$|\d+", "", term))
#     term     estimate std.error   statistic      p.value term_base
#1 (Intercept) 21.132995090 7.5756463  2.78959633 1.017187e-02          
#2          wt -4.916303175 0.6747590 -7.28601380 1.584408e-07        wt
#3        qsec  0.843355538 0.3930252  2.14580532 4.221188e-02      qsec
#4       carb2  0.004133826 1.5321134  0.00269812 9.978695e-01      carb
#5       carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01      carb
#6       carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01      carb
#7       carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01      carb
#8       carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01      carb

Adding an column for the category of glm coeffients in broom results

Answers (1)

Related Questions