Use apply() to iterate linear regression models through multiple dependent variables

Question

I'm computing the model outputs for a linear regression for a dependent variable with 45 different id values. How can I use tidy (dplyr, apply, etc.) code to accomplish this?

I have a dataset with three variables data = c(id, distance, actPct) such that id == 1:45; -10 <= distance <= 10; 0 <= actsPct <= 1.

I need to run a regression, model0n, on each value of id, such that model0n has out put in a new tibble/df. I have completed it for a single regression:

model01 <- data %>% 
filter(id == 1) %>%
filter(distance < 1) %>%
filter(distance > -4)
model01 <- lm(data = model01, actPct~distance)

Example Data

set.seed(42)
id <- as.tibble(sample(1:45,100,replace = T))
distance <- as.tibble(sample(-4:4,100,replace = T))
actPct <- as.tibble(runif(100, min=0, max=1))
data01 <- bind_cols(id=id, distance=distance, actPct=actPct)
attr(data01, "col.names") <- c("id", "distance", "actPct")

I expect a new tibble or dataframe that has model01:model45 so I can put all of the regression outputs into a single table.

kath · Accepted Answer

You can use group_by, nest and mutate with map from the tidyverse to accomplish this:

data01 %>% 
  group_by(id) %>% 
  nest() %>% 
  mutate(models = map(data, ~ lm(actPct ~ distance, data = .x)))

# A tibble: 41 x 3
#       id data             models  
#                  
#  1    42  
#  2    43  
#  3    13  
#  4    38  
#  5    29  
#  6    24  
#  7    34  
#  8     7  
#  9    30  
# 10    32  
# ... with 31 more rows

See also the chapter in R for R for Data Science about many models: https://r4ds.had.co.nz/many-models.html

Data

set.seed(42)
id <- sample(1:45, 100, replace = T)
distance <- sample(-4:4, 100, replace = T)
actPct <- runif(100, min = 0, max = 1)
data01 <- tibble(id = id, distance = distance, actPct = actPct)

Use apply() to iterate linear regression models through multiple dependent variables

Answers (1)

Related Questions