Reputation: 21212
game_name <- "fungame"
day_from <- 7 # train from
day_to <- 30 # predict to
directory <- paste0("/home/rstudio-doug/analysis/radhoc/ltv_models/models/", game_name)
list.files(directory)
fungame_20200201_day_7_to_day_30.rds
fungame_20200221_day_7_to_day_30.rds
fungame_20200222_day_7_to_day_30.rds
fungame_20200201_day_7_to_day_60.rds
fungame_20200221_day_7_to_day_60.rds
fungame_20200222_day_7_to_day_90.rds
Each of these files is a r list variable that include, among other things, a trained model. When I call the script I would like to get the most recently trained model for the corresponding day from and to days. In the example above I have from day 7 to day 30 (my model is trained on in app engagement at 7 days after install and attempts to predict revenue on day 30 since install).
In this case then, I would like to select the most recently trained day 7 to day 30 model, which is this one: fungame_20200222_day_7_to_day_30.rds
.
One approach that I was thinking of would be to split each string on under_score and then save as a data frame. I could then filter the df on day from = 7 and day to = 30 and then select the max (which.max()?).
What would be a conventional way of doing this, if there is one? This script is part of a ML pipeline, any new suggestions or recommendations very much welcome. More holistically I am trying to dynamically select the most recently trained model for a given game and target date. Using strings with under_scores as part of my pipeline just 'feels' like it might not be the most sound approach.
Upvotes: 0
Views: 33
Reputation: 5138
Here is an adaption to a solution that I used to pull several files. It pulls the most recent file based on the file.info()
creation time. Something akin to this:
files <- list.files(path_to_files, pattern = "day_7_to_day_30.rds$", full.names = TRUE)
files[which.max(file.info(files)$ctime)]
Edit: I am on Windows, so ctime
refers to the file creation time. Check out ?file.info
for more info.
Upvotes: 1