Generate chain or sequence of steps without naming them

Question

I'd like to use drake to audit a series of validation and cleaning steps for a dataframe. I think there will be many functions that form a chain, where a dataframe will be passed in, a validation will happen, or a cleaning will happen, and the (possibly cleaned) dataframe will be passed onto the next step. Is there a way to create a chain of function calls without explicitly naming them in the plan?

A plan may look like this:

plan <- drake_plan(
    raw_data = load_data(),
    clean_data_1 = clean_step_1(raw_data, parms = "some parm"),
    clean_data_2 = clean_step_2(clean_data_1, parms = "some parm"),
    clean_data_3 = clean_step_3(clean_data_2, parms = "some parm"),
    ...
    clean_data_100 = clean_step_100(clean_data_99, parms = "some parm"),
)

Is there a way to create this plan without having to come up with the intermediate names clean_data_, and have drake generate those names? It would be nice to keep a config file or some such of the cleaning steps in order, and not have to track the data names so that they can be assembled just in the order that they occur in my config file.

landau · Accepted Answer

I can think of a couple different ways using rlang::syms() and transformations in drake_plan(). First one:

library(drake)
library(rlang)

functions <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))

plan <- drake_plan(
  x = target(
    f(x, param = "some param"),
    transform = map(f = !!functions, x = !!inputs, id = !!index, .id = id)
  )
)

plan
#> # A tibble: 4 x 2
#>   target command                      
#>                            
#> 1 x_1    f1(x_0, param = "some param")
#> 2 x_2    f2(x_1, param = "some param")
#> 3 x_3    f3(x_2, param = "some param")
#> 4 x_4    f4(x_3, param = "some param")

config <- drake_config(plan)
vis_drake_graph(config)

^{Created on 2019-09-27 by the reprex package (v0.3.0)}

Second one:

library(drake)
library(rlang)
library(tibble)

f <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))

grid <- tibble(
  f = f,
  index = index,
  inputs = inputs
)

plan <- drake_plan(
  x = target(
    f(inputs, param = "some param"),
    transform = map(.data = !!grid, .id = index)
  )
)

plan
#> # A tibble: 4 x 2
#>   target command                      
#>                            
#> 1 x_1    f1(x_0, param = "some param")
#> 2 x_2    f2(x_1, param = "some param")
#> 3 x_3    f3(x_2, param = "some param")
#> 4 x_4    f4(x_3, param = "some param")

config <- drake_config(plan)
vis_drake_graph(config)

^{Created on 2019-09-27 by the reprex package (v0.3.0)}

Generate chain or sequence of steps without naming them

Answers (2)

Related Questions