JohannesNE
JohannesNE

Reputation: 1363

How should I use {targets} when I have multiple data files

I have ~50 data files (subjects) that I process individually before I combine them in a data.frame for modelling. I'm unsure how to best use {targets} for this.

I tried using dynamic branching, but I'm unsure how to keep track of subject IDs with this approach. I my current approach I have all data in a named list where first level names are subject IDs, but with targets the names are arbitrary.

I know this is not really a specific questions, but I'm hoping to be pointed towards an appropriate solution instead of getting a "correct" answer for a wrong question.

Upvotes: 4

Views: 1228

Answers (1)

Bruno
Bruno

Reputation: 4150

This is the pattern that I normally use

  tar_files(
    file_paths,
    "file_paths_folder" %>%
      list.files(full.names = TRUE)
  ),
  tar_target(
    processed_files,
    file_paths%>%
      readxl::read_excel() %>% # can be anything read csv, parquet etc.
      janitor::clean_names() %>% # start processing
      mutate_at(vars(a,b,c), as.Date, format = "%Y-%m-%d"), # can be really complex operations
    pattern = map(file_paths)
  )

Upvotes: 6

Related Questions