Reputation: 8366
Is there any way to add a column to the result of the broom package's tidy
function that can act relate the term column back to both the original names used in the formula
argument and their columns in the data
argument.
For example if I run the following I get:
library(ggplot2)
library(dplyr)
mod <- glm(mpg ~ wt + qsec + as.factor(carb), data = mtcars)
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
What I am looking for is something like this:
# term estimate std.error statistic p.value term_base
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
# 4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
# 5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
# 6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
# 7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
# 8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
Not so bothered if the first row in this new column is empty, Intercept
or 1
. Just need something that can match the term column to the original variable names passed to the formula?
Edit
Would be good if it didn't depend on using as.factor
in the formula, e.g. would work on:
mod <- glm(mpg ~ wt + qsec + carb, data = mtcars %>% mutate(carb = factor(carb)))
tidy(mod)
# term estimate std.error statistic p.value
# 1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
# 2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07
# 3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02
# 4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01
# 5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01
# 6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01
# 7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01
# 8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01
Upvotes: 2
Views: 110
Reputation: 887118
We can use regex to create the 'term_base' column
tidy(mod) %>%
mutate(term_base = sub("Intercept", "", gsub(".*\\(|\\).*", "", term)))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 as.factor(carb)2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 as.factor(carb)3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 as.factor(carb)4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 as.factor(carb)6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 as.factor(carb)8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
The as.factor
can be removed from the 'term' as well if we mutate
the 'carb' to factor
before the glm
step
mtcars %>%
mutate(carb = factor(carb)) %>%
glm(formula = mpg ~wt + qsec + carb, data = .) %>%
tidy(.) %>%
mutate(term_base = sub("\\(.*\\)|\\d+", "", term))
# term estimate std.error statistic p.value term_base
#1 (Intercept) 21.132995090 7.5756463 2.78959633 1.017187e-02
#2 wt -4.916303175 0.6747590 -7.28601380 1.584408e-07 wt
#3 qsec 0.843355538 0.3930252 2.14580532 4.221188e-02 qsec
#4 carb2 0.004133826 1.5321134 0.00269812 9.978695e-01 carb
#5 carb3 -0.755346006 2.3451222 -0.32209239 7.501715e-01 carb
#6 carb4 -0.489721798 2.0628564 -0.23739985 8.143615e-01 carb
#7 carb6 -0.886846134 3.4443957 -0.25747510 7.990068e-01 carb
#8 carb8 -0.894783610 3.7496630 -0.23863041 8.134180e-01 carb
Upvotes: 3