Reputation: 109

Estimating separate regressions and generating predictions by group using lapply

I want to use the same regression model for each group and use the coefficients (different across groups) to add predicted values in the same data set. I can do it using a loop, but how can I do it with lapply?

data <- data.frame(x = rnorm(100), y = rnorm(100), id = rep(c('a', 'b', 'c', 'd'), 25))
for (i in unique(data$id)){
  model <- lm(y ~ x, data = subset(data, id == i))
  data$pred[data$id == i] <- predict(model, subset(data, id == i))
}

Upvotes: 1

Answers (4)

akrun

Reputation: 887831

We may also do this in tidyverse as this can be stored and reused in columns

library(dplyr)
library(broom)
library(tidyr)
data1 <- data %>% 
   nest_by(id) %>% 
   mutate(model = list(lm(y ~ x, data = data)), 
            Predict = list(predict(model, data))) %>%
 ungroup
data1
# A tibble: 4 x 4
  id                  data model  Predict   
  <chr> <list<tibble[,2]>> <list> <list>    
1 a               [25 × 2] <lm>   <dbl [25]>
2 b               [25 × 2] <lm>   <dbl [25]>
3 c               [25 × 2] <lm>   <dbl [25]>
4 d               [25 × 2] <lm>   <dbl [25]>

If we want to convert the list column, use unnest

data1 %>% 
   select(id, Predict) %>%
   unnest(Predict)
# A tibble: 100 x 2
   id      Predict
   <chr>     <dbl>
 1 a      0.0157  
 2 a      0.0448  
 3 a      0.0141  
 4 a      0.0220  
 5 a      0.000245
 6 a     -0.0245  
 7 a      0.0441  
 8 a      0.0229  
 9 a      0.00321 
10 a     -0.00271 
# … with 90 more rows

also, can extract the components from the model with tidy from broom

data1 %>%
    select(id, model) %>% 
    rowwise %>%
    mutate(model = list(tidy(model))) %>% 
    unnest(c(model))
# A tibble: 8 x 6
  id    term          estimate std.error statistic p.value
  <chr> <chr>            <dbl>     <dbl>     <dbl>   <dbl>
1 a     (Intercept)  0.0106        0.171  0.0621     0.951
2 a     x           -0.0260        0.223 -0.116      0.908
3 b     (Intercept)  0.0337        0.149  0.226      0.823
4 b     x            0.156         0.123  1.27       0.217
5 c     (Intercept) -0.00722       0.223 -0.0324     0.974
6 c     x           -0.209         0.207 -1.01       0.321
7 d     (Intercept) -0.0959        0.174 -0.550      0.587
8 d     x           -0.0000331     0.214 -0.000154   1.00

Upvotes: 2

Anoushiravan R

Reputation: 21938

You can also use this one in base R:

do.call(rbind, lapply(unique(data$id), function(a) {
  tmp <- subset(data, id == a)
  model <- lm(y ~ x, data = tmp)
  tmp$pred <- predict(model)
  tmp
}))

Upvotes: 2

Peace Wang

Reputation: 2419

lapply is not necessary. The following solution to regress by group seems more straightforward.

library(data.table)
setDT(data)[,pred2:=predict(lm(y ~ x, data = .SD)),by=id]

Upvotes: 2

Ronak Shah

Reputation: 389235

You can use lapply as -

data <- data[order(data$id), ]

data$pred <- unlist(lapply(split(data, data$id), function(df) {
  model <- lm(y ~ x, data = df)
  predict(model, df)
}))

Can also use by -

data$pred <- unlist(by(data, data$id, function(df) {
  model <- lm(y ~ x, data = df)
  predict(model, df)
}))

Upvotes: 2

Estimating separate regressions and generating predictions by group using lapply

Answers (4)

Related Questions