Reputation: 109
I want to use the same regression model for each group and use the coefficients (different across groups) to add predicted values in the same data set. I can do it using a loop, but how can I do it with lapply
?
data <- data.frame(x = rnorm(100), y = rnorm(100), id = rep(c('a', 'b', 'c', 'd'), 25))
for (i in unique(data$id)){
model <- lm(y ~ x, data = subset(data, id == i))
data$pred[data$id == i] <- predict(model, subset(data, id == i))
}
Upvotes: 1
Views: 141
Reputation: 886938
We may also do this in tidyverse
as this can be stored and reused in columns
library(dplyr)
library(broom)
library(tidyr)
data1 <- data %>%
nest_by(id) %>%
mutate(model = list(lm(y ~ x, data = data)),
Predict = list(predict(model, data))) %>%
ungroup
data1
# A tibble: 4 x 4
id data model Predict
<chr> <list<tibble[,2]>> <list> <list>
1 a [25 × 2] <lm> <dbl [25]>
2 b [25 × 2] <lm> <dbl [25]>
3 c [25 × 2] <lm> <dbl [25]>
4 d [25 × 2] <lm> <dbl [25]>
If we want to convert the list
column, use unnest
data1 %>%
select(id, Predict) %>%
unnest(Predict)
# A tibble: 100 x 2
id Predict
<chr> <dbl>
1 a 0.0157
2 a 0.0448
3 a 0.0141
4 a 0.0220
5 a 0.000245
6 a -0.0245
7 a 0.0441
8 a 0.0229
9 a 0.00321
10 a -0.00271
# … with 90 more rows
also, can extract the components from the model with tidy
from broom
data1 %>%
select(id, model) %>%
rowwise %>%
mutate(model = list(tidy(model))) %>%
unnest(c(model))
# A tibble: 8 x 6
id term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a (Intercept) 0.0106 0.171 0.0621 0.951
2 a x -0.0260 0.223 -0.116 0.908
3 b (Intercept) 0.0337 0.149 0.226 0.823
4 b x 0.156 0.123 1.27 0.217
5 c (Intercept) -0.00722 0.223 -0.0324 0.974
6 c x -0.209 0.207 -1.01 0.321
7 d (Intercept) -0.0959 0.174 -0.550 0.587
8 d x -0.0000331 0.214 -0.000154 1.00
Upvotes: 2
Reputation: 21908
You can also use this one in base R:
do.call(rbind, lapply(unique(data$id), function(a) {
tmp <- subset(data, id == a)
model <- lm(y ~ x, data = tmp)
tmp$pred <- predict(model)
tmp
}))
Upvotes: 2
Reputation: 2419
lapply
is not necessary. The following solution to regress by group seems more straightforward.
library(data.table)
setDT(data)[,pred2:=predict(lm(y ~ x, data = .SD)),by=id]
Upvotes: 2
Reputation: 388797
You can use lapply
as -
data <- data[order(data$id), ]
data$pred <- unlist(lapply(split(data, data$id), function(df) {
model <- lm(y ~ x, data = df)
predict(model, df)
}))
Can also use by
-
data$pred <- unlist(by(data, data$id, function(df) {
model <- lm(y ~ x, data = df)
predict(model, df)
}))
Upvotes: 2