kputschko
kputschko

Reputation: 816

Managing multiple models and run times in R

I'm building dozens of predictive models in an effort to identify a champion model. I'm working with gigabytes of data, so tracking run time is important.

I'd like to build all my models in a list-type format, so I don't have to manage all the different model names within the Global Environment. However, it seems that the only way to get timings per model is to have separate named objects.

Here's a basic method that approaches what I'm looking for:

library(tidyverse)

# Basic Approach

Time_1 <- system.time(
  Model_1 <- lm(am ~ disp, mtcars)
)

Time_2 <- system.time(
  Model_2 <- lm(am ~ disp + cyl, mtcars)
)

# etc. for dozens more

Time_List <- 
  mget(ls(pattern = "Time")) %>% 
  bind_rows()

However, as you can see, I have to manually name each model and time record. What I'm looking for is something similar to the table produced with the following code, where "xxx" is an actual record of run time.

# Tribble Output
tribble(
  ~Model_Name, ~Model_Function, ~Run_Time,
  "Model_1", lm(am ~ disp, mtcars), "xxx",
  "Model_2", lm(am ~ disp + cyl, mtcars), "xxx"
)

# A tibble: 2 × 3
  Model_Name Model_Function Run_Time
       <chr>         <list>    <chr>
1    Model_1       <S3: lm>      xxx
2    Model_2       <S3: lm>      xxx

I'd appreciate any input provided, regardless of packages used.

Upvotes: 0

Views: 304

Answers (1)

alistaire
alistaire

Reputation: 43354

If you assign within system.time, you can save both the time and what's computed. If you assign the results to a list column, you can unpack it:

library(tidyverse)

data_frame(formula = c(mpg ~ wt, mpg ~ wt + hp)) %>% 
    mutate(model_time = map(formula, ~{
               time <- system.time(model <- lm(.x, mtcars)); 
               lst(model, time)
           }), 
           model = map(model_time, 'model'), 
           time = map(model_time, 'time')) %>% 
    select(-model_time)
#> # A tibble: 2 × 3
#>         formula    model            time
#>          <list>   <list>          <list>
#> 1 <S3: formula> <S3: lm> <S3: proc_time>
#> 2 <S3: formula> <S3: lm> <S3: proc_time>

Because the columns are all still lists it doesn't look like much, but all of the data is now there and can be further extracted.

An equivalent alternative:

data_frame(formula = c(mpg ~ wt, mpg ~ wt + hp)) %>% 
    mutate(model_time = map(formula, ~{
               time <- system.time(model <- lm(.x, mtcars)); 
               data_frame(model = list(model), 
                          time = list(time))
           })) %>% 
    unnest(model_time)

Upvotes: 1

Related Questions