Reputation: 1248
I'm wondering if there is a better way is to get linear regression coefficients as columns in dplyr. Here is some sample data.
mydata <-
data.frame(
Site = c(1,1,1,1,1,1,1,1),
Site1 = c(2,3,2,3,2,3,2,3),
Age = c(17, 52, 19, 18, 62, 53, 41, 24),
Gender = c(1,2,1,1,2,2,2,1),
Outcome = c(1,1,1,1,0,0,0,1)
)
I wrote this helper function to turn summary(.data)$coefficients
into columns
GetCoefficients <- function(.data){
AllData <- data.frame()
AllData[1, ] <- ""
col_names <- colnames(summary(.data)$coefficients)
row_names <- rownames(summary(.data)$coefficients)
row_len <- length(row_names)
col_len <- length(col_names)-1
x <- summary(.data)$coefficients
for (i in 1:length(x)){
AllData <- AllData %>%
mutate(!!paste0(row_names[ifelse(i%%row_len != 0, i%%row_len, row_len)],
"_",col_names[ceiling(i/col_len)]) := x[i])
}
return(AllData)
}
Using the helper function I can put coefficients into my data.frame()
Linear_regression <- mydata %>%
pivot_longer(starts_with("Site"),
names_to = ".value",
names_pattern = "(^Site)") %>%
group_by(Site) %>%
do(Reg = lm(Outcome ~ Age + Gender, data = .)) %>%
mutate(rsq = summary(Reg)$r.squared) %>%
mutate(fun = GetCoefficients(Reg))
Upvotes: 3
Views: 1462
Reputation: 3700
The function broom::tidy
extracts the coefficient estimates as well as their standard errors, t-statistics & p-values into rows.
library("tidyverse")
mydata <- tibble::tribble(
~Age, ~Gender, ~Outcome, ~Site,
17, 1, 1, "1",
17, 1, 1, "2",
52, 2, 1, "1",
52, 2, 1, "3",
19, 1, 1, "1",
19, 1, 1, "2",
18, 1, 1, "1",
18, 1, 1, "3",
62, 2, 0, "1",
62, 2, 0, "2",
53, 2, 0, "1",
53, 2, 0, "3",
41, 2, 0, "1",
41, 2, 0, "2",
24, 1, 1, "1",
24, 1, 1, "3"
)
mydata %>%
group_by(
Site
) %>%
group_modify(
~ broom::tidy(lm(Outcome ~ Age + Gender, data = .))
)
#> Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
#> # A tibble: 9 × 6
#> # Groups: Site [3]
#> Site term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 (Intercept) 1.75e+ 0 5.37e- 1 3.26e+ 0 2.25e- 2
#> 2 1 Age -9.13e-18 2.44e- 2 -3.74e-16 1 e+ 0
#> 3 1 Gender -7.50e- 1 8.40e- 1 -8.92e- 1 4.13e- 1
#> 4 2 (Intercept) 2 e+ 0 4.20e-16 4.76e+15 1.34e-16
#> 5 2 Age 9.08e-18 1.49e-17 6.10e- 1 6.51e- 1
#> 6 2 Gender -1 e+ 0 5.46e-16 -1.83e+15 3.48e-16
#> 7 3 (Intercept) 1.22e+ 0 2.03e+ 0 6.00e- 1 6.56e- 1
#> 8 3 Age -2.70e- 2 1.62e- 1 -1.67e- 1 8.95e- 1
#> 9 3 Gender 3.51e- 1 5.16e+ 0 6.82e- 2 9.57e- 1
Sometimes however, we would like to extract only the coefficient estimates into columns.
mydata %>%
group_by(
Site
) %>%
group_modify(
~ bind_rows(coefficients(lm(Outcome ~ Age + Gender, data = .)))
)
#> # A tibble: 3 × 4
#> # Groups: Site [3]
#> Site `(Intercept)` Age Gender
#> <chr> <dbl> <dbl> <dbl>
#> 1 1 1.75 -9.13e-18 -0.750
#> 2 2 2 9.08e-18 -1
#> 3 3 1.22 -2.70e- 2 0.351
Created on 2022-04-20 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 79194
Here is a combination of tidyverse
and broom
package to get your desired output.
Very handy here is group_split
-> you get a list and then you iterate with purrr
s map_dfr
(by the way with map_dfr
you get a dataframe otherwise with map
you get a list) your regression lm(...
through each list element. Using broom
s glance gives the desired output:
library(tidyverse)
library(broom)
mydata %>%
pivot_longer(starts_with("Site"),
names_to = ".value",
names_pattern = "(^Site)") %>%
mutate(Site=as.factor(Site)) %>%
group_by(Site) %>%
group_split() %>%
map_dfr(.f = function(df){
lm(Outcome ~ Age+Gender, data=df) %>%
glance() %>%
add_column(Site = unique(df$Site), .before = 1)
})
Site r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 1 0.6 0.44 3.87e- 1 3.75e+ 0 1.01e- 1 2 -1.88 11.8 12.1 7.5 e- 1 5 8
2 2 1 1 2.22e-16 1.01e+31 2.22e-16 2 141. -275. -277. 4.93e-32 1 4
3 3 0.351 -0.946 6.97e- 1 2.71e- 1 8.05e- 1 2 -1.46 10.9 8.47 4.86e- 1 1 4
Upvotes: 6