Reputation: 3329
I'd like to use drake to audit a series of validation and cleaning steps for a dataframe. I think there will be many functions that form a chain, where a dataframe will be passed in, a validation will happen, or a cleaning will happen, and the (possibly cleaned) dataframe will be passed onto the next step. Is there a way to create a chain of function calls without explicitly naming them in the plan?
A plan may look like this:
plan <- drake_plan(
raw_data = load_data(),
clean_data_1 = clean_step_1(raw_data, parms = "some parm"),
clean_data_2 = clean_step_2(clean_data_1, parms = "some parm"),
clean_data_3 = clean_step_3(clean_data_2, parms = "some parm"),
...
clean_data_100 = clean_step_100(clean_data_99, parms = "some parm"),
)
Is there a way to create this plan without having to come up with the intermediate names clean_data_<n>
, and have drake generate those names? It would be nice to keep a config file or some such of the cleaning steps in order, and not have to track the data names so that they can be assembled just in the order that they occur in my config file.
Upvotes: 2
Views: 56
Reputation: 5841
I can think of a couple different ways using rlang::syms()
and transformations in drake_plan()
. First one:
library(drake)
library(rlang)
functions <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))
plan <- drake_plan(
x = target(
f(x, param = "some param"),
transform = map(f = !!functions, x = !!inputs, id = !!index, .id = id)
)
)
plan
#> # A tibble: 4 x 2
#> target command
#> <chr> <expr>
#> 1 x_1 f1(x_0, param = "some param")
#> 2 x_2 f2(x_1, param = "some param")
#> 3 x_3 f3(x_2, param = "some param")
#> 4 x_4 f4(x_3, param = "some param")
config <- drake_config(plan)
vis_drake_graph(config)
Created on 2019-09-27 by the reprex package (v0.3.0)
Second one:
library(drake)
library(rlang)
library(tibble)
f <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))
grid <- tibble(
f = f,
index = index,
inputs = inputs
)
plan <- drake_plan(
x = target(
f(inputs, param = "some param"),
transform = map(.data = !!grid, .id = index)
)
)
plan
#> # A tibble: 4 x 2
#> target command
#> <chr> <expr>
#> 1 x_1 f1(x_0, param = "some param")
#> 2 x_2 f2(x_1, param = "some param")
#> 3 x_3 f3(x_2, param = "some param")
#> 4 x_4 f4(x_3, param = "some param")
config <- drake_config(plan)
vis_drake_graph(config)
Created on 2019-09-27 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 3329
I made a slight tweak to @landau 's answer below. It wasn't splicing in the different functions, and I added a part where I splice in a params argument that is also dynamic but specific to each function.
# https://stackoverflow.com/q/58139703/1022967
library(drake)
library(rlang)
library(tibble)
functions <- syms(paste0("f", seq_len(4)))
index <- as.numeric(seq_len(4))
inputs <- syms(paste0("x_", index - 1))
#params = letters[1:4]
params = c('{"a":1, "b":"z"}', '{"a":2, "b":"z"}', '{"a":3, "b":"z"}', '{"a":4, "b":"z"}')
grid <- tibble(
functions = functions,
index = index,
inputs = inputs,
params = params
)
plan <- drake_plan(
x = target(
f(inputs, param = p),
transform = map(.data = !!grid, .id = index, f = !!functions, p = !!params)
)
)
plan
#> # A tibble: 4 x 2
#> target command
#> <chr> <expr>
#> 1 x_1 f1(x_0, param = "{\"a\":1, \"b\":\"z\"}")
#> 2 x_2 f2(x_1, param = "{\"a\":2, \"b\":\"z\"}")
#> 3 x_3 f3(x_2, param = "{\"a\":3, \"b\":\"z\"}")
#> 4 x_4 f4(x_3, param = "{\"a\":4, \"b\":\"z\"}")
# config <- drake_config(plan)
# vis_drake_graph(config)
Created on 2019-09-27 by the reprex package (v0.3.0)
Upvotes: 2