Reputation: 27
I have some data like this
group_name | x | y
------------------
a | 1 | 2
a | 2 | 4
a | 3 | 6
b | 1 | 4
b | 2 | 3
b | 3 | 2
c | 1 | 2
c | 2 | 5
c | 3 | 8
I would like to group it by group_name, and use Dplyr's summarize function to create a column containing a linear model lm(y ~ x) for each group. Is it even possible? If not, what are the alternatives for creating models for each group?
Thank you in advance
Upvotes: 1
Views: 72
Reputation: 66915
Adapting the example from https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html:
library(tidyverse); library(broom)
df %>%
nest(-group_name) %>%
mutate(fit = map(data, ~lm(y ~ x, data = .x)),
tidied = map(fit, tidy)) %>%
unnest(tidied)
group_name term estimate std.error statistic p.value
1 a (Intercept) 0 0.000000e+00 NaN NaN
2 a x 2 0.000000e+00 Inf 0.000000e+00
3 b (Intercept) 5 1.017536e-15 4.913830e+15 1.295567e-16
4 b x -1 4.710277e-16 -2.123017e+15 2.998656e-16
5 c (Intercept) -1 1.356715e-15 -7.370745e+14 8.637116e-16
6 c x 3 6.280370e-16 4.776789e+15 1.332736e-16
Edit: One way to get the predictions is to use augment
from broom
:
library(tidyverse); library(broom)
df %>%
nest(-group_name) %>%
mutate(fit = map(data, ~lm(y ~ x, data = .x)),
predictions = map(fit, augment)) %>%
unnest(predictions)
group_name y x .fitted .se.fit .resid .hat .sigma .rownames .cooksd .std.resid
1 a 2 1 2 0.000000e+00 0.000000e+00 0.8333333 NaN <NA> NA NA
2 a 4 2 4 0.000000e+00 0.000000e+00 0.3333333 NaN <NA> NA NA
3 a 6 3 6 0.000000e+00 0.000000e+00 0.8333333 NaN <NA> NA NA
4 b 4 1 4 6.080942e-16 2.719480e-16 0.8333333 NaN 4 2.50 1
5 b 3 2 3 3.845925e-16 -5.438960e-16 0.3333333 NaN 5 0.25 -1
6 b 2 3 2 6.080942e-16 2.719480e-16 0.8333333 Inf 6 2.50 1
7 c 2 1 2 8.107923e-16 -3.625973e-16 0.8333333 NaN 7 2.50 -1
8 c 5 2 5 5.127900e-16 7.251946e-16 0.3333333 NaN 8 0.25 1
9 c 8 3 8 8.107923e-16 -3.625973e-16 0.8333333 Inf 9 2.50 -1
Upvotes: 2
Reputation: 1792
Here's one way of doing it.
I had to slightly change your test data, because I think there were issues of perfect colinearity.
df <- data.frame(stringsAsFactors=FALSE,
group.name = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
x = c(1, 2, 3.5, 1, 2.5, 3, 1, 2, 3.5),
y = c(2, 4, 6, 4, 3, 2, 2, 5, 8)
)
library(dplyr)
groups <- unique(df$group.name)
groups
for (i in groups){
df_subgroup <- filter(df, group.name==i)
print(paste("group", i))
model <- lm(y ~ x, data = df_subgroup)
print(summary(model))
}
And this is what you get. I used the stargazer package to make the output easier to read, but you could just use summary(model)
#> [1] "group a"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x 1.579*
#> (0.182)
#>
#> Constant 0.579
#> (0.437)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.987
#> Adjusted R2 0.974
#> Residual Std. Error 0.324 (df = 1)
#> F Statistic 75.000* (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
#> [1] "group b"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x -0.923
#> (0.266)
#>
#> Constant 5.000*
#> (0.620)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.923
#> Adjusted R2 0.846
#> Residual Std. Error 0.392 (df = 1)
#> F Statistic 12.000 (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
#> [1] "group c"
#>
#> ===============================================
#> Dependent variable:
#> ---------------------------
#> y
#> -----------------------------------------------
#> x 2.368*
#> (0.273)
#>
#> Constant -0.132
#> (0.656)
#>
#> -----------------------------------------------
#> Observations 3
#> R2 0.987
#> Adjusted R2 0.974
#> Residual Std. Error 0.487 (df = 1)
#> F Statistic 75.000* (df = 1; 1)
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Upvotes: 0