Reputation: 31
I am trying to fit model subsets to a nested data frame. While I have seen many examples of fitting the same model to different groups of data, I have yet to come across an example that fits different models to a dataset that is organized as a nested dataframe.
As an example, I have taken code from the R For Data Science "Many Models" section. Here, the goal was to fit the same model to different countries (groups). What I am hoping to do is to expand this and fit multiple different competing models to the different countries (groups). Ideally, each competing model would then be stored as a new column in the nested dataframe.
Thanks in advance for the help!
# Example code
library(dplyr)
library(ggplot2)
library(modelr)
library(purrr)
library(tidyr)
library(gapminder)
# Create nested data
by_country <- gapminder %>%
group_by(country, continent) %>%
nest()
# Model 1
country_model <- function(df) {
lm(lifeExp ~ year, data = df)
}
# Map model 1 to the data
by_country <- by_country %>%
mutate(model = map(data, country_model))
# Model 2
country_model2 <- function(df) {
lm(lifeExp ~ year + gdpPercap, data = df)
}
# Map Model 2 to the data
by_country <- by_country %>%
mutate(model2 = map(data, country_model2))
UPDATED To clarify my question, I know I could manually do this with a call to mutate for each of the models. I think what I am after is something more flexible, almost something similar to the code below. However, instead of functions "runif", "rnorm" and "rpois", these functions would be calls to model functions. For example "country_model" and "country_model2". Hopefully that helps a bit.
# Example code
sim <- dplyr::frame_data(
~f, ~params,
"runif", list(min = -1, max = -1),
"rnorm", list(sd = 5),
"rpois", list(lambda = 10)
)
sim %>% dplyr::mutate(
samples = invoke_map(f, params, n = 10)
)
Upvotes: 3
Views: 811
Reputation: 451
Here's an approach that uses the invoke_map function mentioned in your update.
It involves creating three functions. These functions: 1. Create a data frame in which your models are specified 2. Use the invoke_map function to apply these models to your data 3. Reshape the results, so that they can be added as columns in your original by_country data frame
# Example code
library(dplyr)
library(ggplot2)
library(modelr)
library(purrr)
library(tidyr)
library(gapminder)
# Create nested data
by_country <- gapminder %>%
group_by(country, continent) %>%
nest()
# Function that creates dataframe suitable for invoke_map function
create_model_df <-
function(x){
dplyr::frame_data(
~model_name, ~f, ~params,
"country_model", "lm", list(formula =as.formula("lifeExp ~ year + gdpPercap"), data = x ),
"country_model2","lm", list(formula =as.formula("lifeExp ~ year"),data = x )
)
}
# Function that applies invoke_map function
apply_models <-
function(x){
x %>%
mutate( model_fit = invoke_map(f, params))
}
# Function that the results from invoke map
reshape_results <-
function(x){
x %>%
select(model_name,model_fit) %>% spread(model_name,model_fit)
}
# Apply these functions
by_country %>%
mutate(model_df = data %>%
map(create_model_df) %>%
map(apply_models) %>%
map(reshape_results)) %>%
unnest(model_df)
#> # A tibble: 142 x 5
#> country continent data country_model country_model2
#> <fctr> <fctr> <list> <list> <list>
#> 1 Afghanistan Asia <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 2 Albania Europe <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 3 Algeria Africa <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 4 Angola Africa <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 5 Argentina Americas <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 6 Australia Oceania <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 7 Austria Europe <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 8 Bahrain Asia <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 9 Bangladesh Asia <tibble [12 x 4]> <S3: lm> <S3: lm>
#> 10 Belgium Europe <tibble [12 x 4]> <S3: lm> <S3: lm>
#> # ... with 132 more rows
Upvotes: 1