Reputation: 547
I have a very simple code that generates a training and testing set for K-fold cross validation.
I have a matrix X[20x15] and If I take the number of folds n_folds
e.g. 10 I get a matrix : trainingData
[18x15] and testData
[2x15] which is correct.
Now, if I change the number of folds n_folds=20
, I test a trainingData
[19x15] which is correct, but for the testData
, R generates a list, and not a matrix [1x15]. When I use the as.matrix
function, it generates me a [15x1] matrix and not [1x15].
Here is the code for n_fold=20
:
library(dplyr)
library(tidyr)
require(stats)
set.seed(19875)
n=20
p=15
real_p=15
x=matrix(rnorm(n*p), nrow=n, ncol=p)
n_folds=20
#Randomly shuffle the data
x=x[sample(nrow(x)),]
folds=cut(seq(1, nrow(x)), breaks = n_folds, labels = FALSE)
#Perform 10 fold cross validation
for(i in 1:n_folds){
#segment your data by folds using the which() function
testIndexes=which(folds==i, arr.ind = TRUE)
testData=x[testIndexes,]
trainData=x[-testIndexes,]
}
What would be the simplest way to generate a matrix for the testData
which would be a matrix [1x15] and not a list?
Upvotes: 0
Views: 78
Reputation: 131
I re-wrote your code a bit and came up with this, I hope it is useful:
library ( dplyr )
library ( tidyr )
library ( stats )
library ( magrittr )
set.seed ( 19875 )
N <- 20
P <- 15
X <- matrix ( rnorm ( N * P ), N )
N_Folds <- 5
Folds <- rep ( 1:N_Folds, l = N ) %>% sample
for ( Fold in 1:N_Folds ){
Validation <- which ( Fold == Folds )
Valid_Data <- X [ Validation,, drop = FALSE ]
Train_Data <- X [ -Validation,, drop = FALSE ]
}
Train_Data %>% dim
Valid_Data %>% dim
David
Upvotes: 1
Reputation: 7610
Your issue here is that you are extracting the rows, and you only have one row, so you're getting a vector. To enforce a matrix, use a call to matrix. Your initial try with a call to matrix used the default way of constructing a matrix, which produces a single column. Specify what you want the columns and the rows to be. I've presumed you want the dimensions to be length(testIndexes)
by p
, but if it's something else, you can use this as the form of the correct answer. Just sub in what you want it to be.
set.seed(19875)
n=20
p=15
real_p=15
x=matrix(rnorm(n*p), nrow=n, ncol=p)
n_folds=20
#Randomly shuffle the data
x=x[sample(nrow(x)),]
folds=cut(seq(1, nrow(x)), breaks = n_folds, labels = FALSE)
#Perform 10 fold cross validation
for(i in 1:n_folds){
#segment your data by folds using the which() function
testIndexes=which(folds==i, arr.ind = TRUE)
testData=matrix(x[testIndexes,], length(testIndexes), p)
trainData=x[-testIndexes,]
}
Upvotes: 1