Reputation: 7107
I am working with some data which looks like:
# A tibble: 2 x 3
splits id inner_resamples
<named list> <chr> <named list>
1 <split [20/20]> Resample1 <tibble [6 x 2]>
2 <split [20/20]> Resample2 <tibble [6 x 2]>
What I want to do is to map
over the inner_resamples
column and map
again over the splits
column in each inner_resamples
column. For each list I would like to map
again.
The way to do this is to use the analysis
function from the rsample
package.
map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()
What I would like to do is to map over each of the outputs and create new data 7 columns:
> map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()
[[1]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-13 CAT1 796. 1
2 2016-12-14 CAT1 797. 0
[[2]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-15 CAT1 798. 1
2 2016-12-16 CAT1 791. 0
[[3]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-19 CAT1 794. 1
2 2016-12-20 CAT1 796. 0
[[4]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-21 CAT1 795. 0
2 2016-12-22 CAT1 791. 0
[[5]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-23 CAT1 790. 0
2 2016-12-27 CAT1 792. 1
[[6]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0
2 2016-12-29 CAT1 783. 0
Expected output would be (for 1 of the outputs)
[[6]]
# A tibble: 2 x 4
time ID Value out NEWCOL
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0 8677
2 2016-12-29 CAT1 783. 0 8757
However I would like to also do this for each N
in the data:
map(cv_rolling$inner_resamples$`N`$splits, ~ analysis(.x)) %>% tail()
Where N
here can be accessed by:
cv_rolling$inner_resamples[[1]]
cv_rolling$inner_resamples[[2]]
cv_rolling$inner_resamples[[N]]
New Data:
structure(list(time = structure(c(17136, 17137, 17140, 17141,
17142, 17143, 17144, 17147, 17148, 17149, 17150, 17151, 17154,
17155, 17156, 17157, 17158, 17162, 17163, 17164, 17165, 17136,
17137, 17140, 17141, 17142, 17143, 17144, 17147, 17148, 17149,
17150, 17151, 17154, 17155, 17156, 17157, 17158, 17162, 17163,
17164, 17165), class = "Date"), ID = c("CAT1", "CAT1", "CAT1",
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1",
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1",
"CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2",
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2",
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2"), Value = c(747.919983,
750.5, 762.52002, 759.109985, 771.190002, 776.419983, 789.289978,
789.27002, 796.099976, 797.070007, 797.849976, 790.799988, 794.200012,
796.419983, 794.559998, 791.26001, 789.909973, 791.549988, 785.049988,
782.789978, 771.820007, 56.283112, 56.330643, 57.252861, 56.996159,
58.346195, 58.003925, 58.916634, 59.106773, 59.876858, 59.591648,
59.496574, 59.230362, 60.485325, 60.409275, 60.409275, 60.418777,
60.124058, 60.162071, 59.886375, 59.800812, 59.078251), out = c(0,
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0)), row.names = c(NA,
-42L), index_quo = ~date, index_time_zone = "UTC", class = c("tbl_time",
"tbl_df", "tbl", "data.frame"))
Also need to run:
library(rsample)
library(purrr)
library(tibbletime)
periods_train <- 2
periods_test <- 1
skip_span <- 1
cv_rolling <- nested_cv(df,
outside = group_vfold_cv(group = "ID"),
inside = rolling_origin(
initial = periods_train,
assess = periods_test,
cumulative = FALSE,
skip = skip_span))
Where the following can be run:
map(cv_rolling$inner_resamples$`2`$splits, ~ analysis(.x))
Which is what I am trying to modify / create new data from.
Upvotes: 2
Views: 183
Reputation: 18541
I am not sure, what kind of function you want to apply to generate NEWCOL
, but here is some toy example for your data which just divides the original Value
column by 10:
cv_rolling %>%
mutate(data = map(inner_resamples, "splits"),
data2 = map_depth(data, 2, rsample::analysis),
data3 = map_depth(data2, 2, ~ mutate(.x, NEWCOL = Value/10)))
If the mutate call is rather complex you could put it in a helper function.
mutate_helper <- function(df) {
mutate(df, NEWCOL = Value/10)
}
cv_rolling %>%
mutate(data = map(inner_resamples, "splits"),
data2 = map_depth(data, 2, rsample::analysis),
data3 = map_depth(data2, 2, mutate_helper))
Upvotes: 1