mpettis
mpettis

Reputation: 3329

Generate chain or sequence of steps without naming them

I'd like to use drake to audit a series of validation and cleaning steps for a dataframe. I think there will be many functions that form a chain, where a dataframe will be passed in, a validation will happen, or a cleaning will happen, and the (possibly cleaned) dataframe will be passed onto the next step. Is there a way to create a chain of function calls without explicitly naming them in the plan?

A plan may look like this:

plan <- drake_plan(
    raw_data = load_data(),
    clean_data_1 = clean_step_1(raw_data, parms = "some parm"),
    clean_data_2 = clean_step_2(clean_data_1, parms = "some parm"),
    clean_data_3 = clean_step_3(clean_data_2, parms = "some parm"),
    ...
    clean_data_100 = clean_step_100(clean_data_99, parms = "some parm"),
)

Is there a way to create this plan without having to come up with the intermediate names clean_data_<n>, and have drake generate those names? It would be nice to keep a config file or some such of the cleaning steps in order, and not have to track the data names so that they can be assembled just in the order that they occur in my config file.

Upvotes: 2

Views: 56

Answers (2)

landau
landau

Reputation: 5841

I can think of a couple different ways using rlang::syms() and transformations in drake_plan(). First one:

library(drake)
library(rlang)

functions <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))

plan <- drake_plan(
  x = target(
    f(x, param = "some param"),
    transform = map(f = !!functions, x = !!inputs, id = !!index, .id = id)
  )
)

plan
#> # A tibble: 4 x 2
#>   target command                      
#>   <chr>  <expr>                       
#> 1 x_1    f1(x_0, param = "some param")
#> 2 x_2    f2(x_1, param = "some param")
#> 3 x_3    f3(x_2, param = "some param")
#> 4 x_4    f4(x_3, param = "some param")

config <- drake_config(plan)
vis_drake_graph(config)

Created on 2019-09-27 by the reprex package (v0.3.0)

Second one:

library(drake)
library(rlang)
library(tibble)

f <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))

grid <- tibble(
  f = f,
  index = index,
  inputs = inputs
)

plan <- drake_plan(
  x = target(
    f(inputs, param = "some param"),
    transform = map(.data = !!grid, .id = index)
  )
)

plan
#> # A tibble: 4 x 2
#>   target command                      
#>   <chr>  <expr>                       
#> 1 x_1    f1(x_0, param = "some param")
#> 2 x_2    f2(x_1, param = "some param")
#> 3 x_3    f3(x_2, param = "some param")
#> 4 x_4    f4(x_3, param = "some param")

config <- drake_config(plan)
vis_drake_graph(config)

Created on 2019-09-27 by the reprex package (v0.3.0)

Upvotes: 1

mpettis
mpettis

Reputation: 3329

I made a slight tweak to @landau 's answer below. It wasn't splicing in the different functions, and I added a part where I splice in a params argument that is also dynamic but specific to each function.

# https://stackoverflow.com/q/58139703/1022967

library(drake)
library(rlang)
library(tibble)

functions <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))
#params = letters[1:4]
params = c('{"a":1, "b":"z"}', '{"a":2, "b":"z"}', '{"a":3, "b":"z"}', '{"a":4, "b":"z"}')

grid <- tibble(
  functions = functions,
  index = index,
  inputs = inputs,
  params = params
)

plan <- drake_plan(
  x = target(
    f(inputs, param = p),
    transform = map(.data = !!grid, .id = index, f = !!functions, p = !!params)
  )
)

plan
#> # A tibble: 4 x 2
#>   target command                                  
#>   <chr>  <expr>                                   
#> 1 x_1    f1(x_0, param = "{\"a\":1, \"b\":\"z\"}")
#> 2 x_2    f2(x_1, param = "{\"a\":2, \"b\":\"z\"}")
#> 3 x_3    f3(x_2, param = "{\"a\":3, \"b\":\"z\"}")
#> 4 x_4    f4(x_3, param = "{\"a\":4, \"b\":\"z\"}")

# config <- drake_config(plan)
# vis_drake_graph(config)

Created on 2019-09-27 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions