Reputation: 3
I'm computing the model outputs for a linear regression for a dependent variable with 45 different id values. How can I use tidy (dplyr, apply, etc.) code to accomplish this?
I have a dataset with three variables data
= c(id
, distance
, actPct)
such that id
== 1:45; -10 <= distance
<= 10; 0 <= actsPct
<= 1.
I need to run a regression, model0n
, on each value of id
, such that model0n
has out put in a new tibble/df. I have completed it for a single regression:
model01 <- data %>%
filter(id == 1) %>%
filter(distance < 1) %>%
filter(distance > -4)
model01 <- lm(data = model01, actPct~distance)
Example Data
set.seed(42)
id <- as.tibble(sample(1:45,100,replace = T))
distance <- as.tibble(sample(-4:4,100,replace = T))
actPct <- as.tibble(runif(100, min=0, max=1))
data01 <- bind_cols(id=id, distance=distance, actPct=actPct)
attr(data01, "col.names") <- c("id", "distance", "actPct")
I expect a new tibble or dataframe that has model01
:model45
so I can put all of the regression outputs into a single table.
Upvotes: 0
Views: 382
Reputation: 7724
You can use group_by
, nest
and mutate
with map
from the tidyverse
to accomplish this:
data01 %>%
group_by(id) %>%
nest() %>%
mutate(models = map(data, ~ lm(actPct ~ distance, data = .x)))
# A tibble: 41 x 3
# id data models
# <int> <list> <list>
# 1 42 <tibble [3 x 2]> <S3: lm>
# 2 43 <tibble [4 x 2]> <S3: lm>
# 3 13 <tibble [2 x 2]> <S3: lm>
# 4 38 <tibble [4 x 2]> <S3: lm>
# 5 29 <tibble [2 x 2]> <S3: lm>
# 6 24 <tibble [5 x 2]> <S3: lm>
# 7 34 <tibble [5 x 2]> <S3: lm>
# 8 7 <tibble [3 x 2]> <S3: lm>
# 9 30 <tibble [2 x 2]> <S3: lm>
# 10 32 <tibble [2 x 2]> <S3: lm>
# ... with 31 more rows
See also the chapter in R for R for Data Science about many models: https://r4ds.had.co.nz/many-models.html
Data
set.seed(42)
id <- sample(1:45, 100, replace = T)
distance <- sample(-4:4, 100, replace = T)
actPct <- runif(100, min = 0, max = 1)
data01 <- tibble(id = id, distance = distance, actPct = actPct)
Upvotes: 2