Reputation: 18490
In my use case, I have a subfolder with some input csv that are generated outside from the targets
pipeline. The names and number of csv is unknown.
The "last" target is an Excel file that contains all the csv files.
When I call tar_make()
, I want the pipeline to rebuild the excel file if some csv has been altered or added.
Here is what I tried to far: The following is just a setup for the example
folder <- tempdir()
curr_wd <- getwd()
on.exit(setwd(curr_wd))
setwd(folder)
dir.create("sheets") |> suppressWarnings()
write.csv(iris, file.path("sheets", "iris.csv"))
write.csv(mtcars, file.path("sheets", "mtcars.csv"))
The following is the _targets.R. I try to generate a list of targets dynamically. First, the files in the subfolder are defined as targets. Then, I add a target that processes these files (in this example, create an excel file).
library(targets)
tar_option_set(
packages = "openxlsx"
)
lapply(list.files("sheets"), \(fn) {
tar_target_raw(
fn,
paste0("sheets/", fn),
format="file_fast"
)
}) |>
append(
tar_target(
result.xlsx,
{write.xlsx(lapply(list.files("sheets", full.names=TRUE), read.csv), "result.xlsx"); "result.xlsx"},
format = "file"
)
)
The problem is, that this code does not reflect the dependency structure. How can I make the last target depend on the csv files?
Upvotes: 2
Views: 154
Reputation: 6921
One approach would be to
library(targets)
tar_option_set(
packages = "openxlsx"
)
build_xlsx <- \(datafile_names,
datafile_contents ## to create dependeny on 1st target
){
lapply(datafile_names, \(fn) read.csv(fn)) |>
write.xlsx(file = 'result.xlsx')
}
datafile_names <- list.files('sheets', ## directory containing CSVs
full.names = TRUE
)
targets <-
list(
tar_target(datafile_contents, datafile_names, format = 'file_fast'),
tar_target(result.xlsx, build_xlsx(datafile_names, datafile_contents))
)
(Note that you can supply a list of filenames for the first target unless you need to status check each file separately.)
Dependency graph (drawn with tar_visnetwork
) of above example after changing one of the data files:
Upvotes: 1
Reputation: 1280
To track directories/changes in files you can do something like this. Where you are tracking the directory you are interested in withs tarchetypes::tar_files
library(targets)
tar_script({
suppressWarnings(dir.create("sheets"))
write.csv(iris, file.path("sheets", "iris.csv"))
write.csv(mtcars, file.path("sheets", "mtcars.csv"))
list(tarchetypes::tar_files(files_to_track, command = list.files(path = "sheets",
pattern = "*.csv", full.names = TRUE), format = "file"),
targets::tar_target(readin, lapply(files_to_track, function(x) read.csv(x))),
targets::tar_target(change_files, command = lapply(readin,
function(x) dplyr::mutate(x, did_change = "This should do what is expected"))))
})
tar_make()
#> ▶ dispatched target files_to_track_files
#> ● completed target files_to_track_files [0.001 seconds]
#> ▶ dispatched branch files_to_track_a07734b933957420
#> ● completed branch files_to_track_a07734b933957420 [0 seconds]
#> ▶ dispatched branch files_to_track_2617a68a7fc1f9b6
#> ● completed branch files_to_track_2617a68a7fc1f9b6 [0 seconds]
#> ● completed pattern files_to_track
#> ▶ dispatched target readin
#> ● completed target readin [0.001 seconds]
#> ▶ dispatched target change_files
#> ● completed target change_files [0.004 seconds]
#> ▶ ended pipeline [0.046 seconds]
Created on 2024-07-15 with reprex v2.1.1
Upvotes: 1