Ville
Ville

Reputation: 547

R: generate a matrix instead of a list

I have a very simple code that generates a training and testing set for K-fold cross validation. I have a matrix X[20x15] and If I take the number of folds n_folds e.g. 10 I get a matrix : trainingData[18x15] and testData[2x15] which is correct.

Now, if I change the number of folds n_folds=20, I test a trainingData[19x15] which is correct, but for the testData, R generates a list, and not a matrix [1x15]. When I use the as.matrix function, it generates me a [15x1] matrix and not [1x15].

Here is the code for n_fold=20:

library(dplyr)
library(tidyr)
require(stats)
set.seed(19875)
n=20
p=15
real_p=15
x=matrix(rnorm(n*p), nrow=n, ncol=p)
n_folds=20
  #Randomly shuffle the data 
  x=x[sample(nrow(x)),]
  folds=cut(seq(1, nrow(x)), breaks = n_folds, labels = FALSE)
  #Perform 10 fold cross validation 
  for(i in 1:n_folds){
    #segment your data by folds using the which() function 
    testIndexes=which(folds==i, arr.ind = TRUE)
    testData=x[testIndexes,]
    trainData=x[-testIndexes,]
  }

What would be the simplest way to generate a matrix for the testData which would be a matrix [1x15] and not a list?

Upvotes: 0

Views: 78

Answers (2)

David.
David.

Reputation: 131

I re-wrote your code a bit and came up with this, I hope it is useful:

library ( dplyr )
library ( tidyr )
library ( stats )
library ( magrittr )

set.seed ( 19875 )

N <- 20
P <- 15

X <- matrix ( rnorm ( N * P ), N )
N_Folds <- 5

Folds <- rep ( 1:N_Folds, l = N ) %>% sample

for ( Fold in 1:N_Folds ){
  Validation <- which ( Fold == Folds )
  Valid_Data <- X [ Validation,, drop = FALSE ]
  Train_Data <- X [ -Validation,, drop = FALSE ]
}

Train_Data %>% dim
Valid_Data %>% dim

David

Upvotes: 1

De Novo
De Novo

Reputation: 7610

Your issue here is that you are extracting the rows, and you only have one row, so you're getting a vector. To enforce a matrix, use a call to matrix. Your initial try with a call to matrix used the default way of constructing a matrix, which produces a single column. Specify what you want the columns and the rows to be. I've presumed you want the dimensions to be length(testIndexes) by p, but if it's something else, you can use this as the form of the correct answer. Just sub in what you want it to be.

set.seed(19875)
n=20
p=15
real_p=15
x=matrix(rnorm(n*p), nrow=n, ncol=p)
n_folds=20
#Randomly shuffle the data 
x=x[sample(nrow(x)),]
folds=cut(seq(1, nrow(x)), breaks = n_folds, labels = FALSE)
#Perform 10 fold cross validation 
for(i in 1:n_folds){
  #segment your data by folds using the which() function 
  testIndexes=which(folds==i, arr.ind = TRUE)
  testData=matrix(x[testIndexes,], length(testIndexes), p)
  trainData=x[-testIndexes,]
}

Upvotes: 1

Related Questions