Reputation: 7127
I want to create a sequence of lists in R based on some time dependent sequence. The closest I have found to what I want is the following:
rBayesianOptimization::KFold(seq(1:30), nfolds = 5,
stratified = TRUE, seed = 0)
Which gives:
[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
[38] 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 7 7 7 7 7 7
[75] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 9 9 9 9 9 9 9 9 9 11 11 11 11 11 11 14 14 14 14 14 14
[112] 14 14 14 14 14 14 14 14 14 14 14 23 23 25 25 25 25 25
[[2]]
[1] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 5 5 5 5 5
[38] 5 5 5 5 5 5 5 6 6 6 6 6 7 7 8 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 13 13 13 13
[75] 13 13 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 19 19 19 19 19 19 20 20 21 21 21 22 22
[112] 22 22 23 23 24 24 25 26 26 27 28 28 29 30
[[3]]
[1] 2 2 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 8
[38] 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 11 11 11 12 12 12 12 12 12 12 12 12
[75] 12 12 12 12 12 13 13 13 13 15 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 19 19 19 20 21 21 21 22 22 22
[112] 23 23 24 24 26 27 27
[[4]]
[1] 1 2 2 3 3 3 3 3 3 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 10 10 10 10 10 11 12 12
[39] 12 13 13 13 13 13 13 13 13 15 15 15 15 15 15 15 15 16 16 16 16 16 17 17 17 17 18 19 19 19 20 20 20 20 20 20 20 20
[77] 21 21 21 21 22 22 23 23 24 26 26 27 28 29
[[5]]
[1] 1 1 1 1 1 1 1 2 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 7 7 7 7 7 7 7 8 8 8 9 9
[39] 9 9 9 9 14 14 14 14 14 14 14 14 14 14 14 14 14 14 24 24 25 25 25
I would like to keep the order of the original data and not mix the data up or miss any observations.
I would like to keep the "structure" of the above lists however. The closest I have come to a solution is:
f <- function(x){
r = rollapply(x, width = 5, FUN = print)
return(r)
}
f(seq(1:30))
Which gives what I want just not in the list structure previously:
Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 4 5 6 7 8
[5,] 5 6 7 8 9
[6,] 6 7 8 9 10
[7,] 7 8 9 10 11
[8,] 8 9 10 11 12
[9,] 9 10 11 12 13
[10,] 10 11 12 13 14
[11,] 11 12 13 14 15
[12,] 12 13 14 15 16
[13,] 13 14 15 16 17
[14,] 14 15 16 17 18
[15,] 15 16 17 18 19
[16,] 16 17 18 19 20
[17,] 17 18 19 20 21
[18,] 18 19 20 21 22
[19,] 19 20 21 22 23
[20,] 20 21 22 23 24
[21,] 21 22 23 24 25
[22,] 22 23 24 25 26
[23,] 23 24 25 26 27
[24,] 24 25 26 27 28
[25,] 25 26 27 28 29
[26,] 26 27 28 29 30
The desired output would be something similar as doing t(f(seq(1:30)))
but having the list structure:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[2,] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
[3,] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[4,] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[5,] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[,21] [,22] [,23] [,24] [,25] [,26]
[1,] 21 22 23 24 25 26
[2,] 22 23 24 25 26 27
[3,] 23 24 25 26 27 28
[4,] 24 25 26 27 28 29
[5,] 25 26 27 28 29 30
I have tried splitting by rows also without luck:
d <- f(seq(1:30))
split(d, seq(ncol(d)))
EDIT:
Expected output:
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
[[2]]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[[3]]
[1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[[4]]
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[[5]]
[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
where [[1]]
, [[2]]
, [[3]]
, [[4]]
, [[5]]
are lists.
SessionInfo()
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] igraph_1.2.4.2 rBayesianOptimization_1.1.0 MlBayesOpt_0.3.4 data.table_1.12.8
[5] Matrix_1.2-17 splitstackshape_1.4.8 rvest_0.3.5 xml2_1.2.2
[9] chron_2.3-54 forecast_8.9 forcats_0.4.0 readr_1.3.1
[13] tibble_2.1.3 tidyverse_1.3.0 geosphere_1.5-10 imputeTS_3.0
[17] xgboost_666.6.4.1 tsfeatures_1.0.1 rsample_0.0.5.9000 purrr_0.3.3
[21] directlabels_2018.05.22 ggplot2_3.2.1 tidyquant_0.5.8 quantmod_0.4-15
[25] TTR_0.23-5 PerformanceAnalytics_1.5.3 xts_0.11-2 zoo_1.8-6
[29] lubridate_1.7.4 tidyr_1.0.0 stringr_1.4.0 dplyr_0.8.3
loaded via a namespace (and not attached):
[1] nlme_3.1-142 fs_1.3.1 httr_1.4.1 tools_3.6.1 backports_1.1.5 R6_2.4.1
[7] DBI_1.0.0.9004 lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12 withr_2.1.2 sp_1.3-2
[13] tidyselect_0.2.5 curl_4.3 compiler_3.6.1 cli_1.1.0 labeling_0.3 tseries_0.10-47
[19] scales_1.1.0 lmtest_0.9-37 fracdiff_1.4-2 quadprog_1.5-8 digest_0.6.23 pkgconfig_2.0.3
[25] lhs_1.0.1 dbplyr_1.4.2 rlang_0.4.2 readxl_1.3.1 rstudioapi_0.10 farver_2.0.1
[31] generics_0.0.2 jsonlite_1.6 magrittr_1.5 GPfit_1.0-8 Rcpp_1.0.3 Quandl_2.10.0
[37] munsell_0.5.0 lifecycle_0.1.0 furrr_0.1.0 stringi_1.4.3 plyr_1.8.4 grid_3.6.1
[43] parallel_3.6.1 listenv_0.7.0 crayon_1.3.4 lattice_0.20-38 haven_2.2.0 hms_0.5.2.9000
[49] zeallot_0.1.0 pillar_1.4.2 ranger_0.11.2 codetools_0.2-16 reprex_0.3.0 urca_1.3-0
[55] glue_1.3.1 stinepack_1.4 modelr_0.1.5 vctrs_0.2.0 selectr_0.4-2 foreach_1.4.7
[61] cellranger_1.1.0 gtable_0.3.0 future_1.15.1 assertthat_0.2.1 broom_0.5.2 e1071_1.7-3
[67] class_7.3-15 timeDate_3043.102 iterators_1.0.12 globals_0.12.4 ellipsis_0.3.0
Upvotes: 1
Views: 48
Reputation: 887501
We can use asplit
(recently introduced in base R
- >= R 3.6.0) and specify the MARGIN
to split (1 - row, 2 - column)
asplit(d, 2)
#[[1]]
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#[[2]]
#[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
#[[3]]
# [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#[[4]]
# [1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
#[[5]]
# [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Or another option is to do the split
with col
split(d, col(d))
The asplit
can be added into the function itself
f <- function(x){
asplit(rollapply(x, width = 5, FUN = I), 2)
}
f(1:30) # it output the expected list
Or another option is embed
from base R
n <- 5
asplit(embed(1:30, n)[, n:1 ], 2)
Upvotes: 2