Reputation: 125
I am trying to create training, validation, and test data sets. (Before I filter the dataframe into the appropriate data sets, I am trying to create vectors with the list of rows that each data set will contain.
There are 654 observations, and I intend to place 354 in training, and 200 Validation, 100 test.
Here is the code I used:
x <- 1:654
train_ind <- sample(x, 354)
rest <- x[-train_ind]
length(rest)
[1] 300
valid <- sample(rest, 200)
length(valid)
[1] 200
test <- rest[-valid]
length(test)
[1] 210
I don't understand why the test
object is 210!
I would think that since valid
is only length 200, that if I take rest
(300) and negative index the valid
, then I will only be left with 100.
I appreciate any input into what I'm doing wrong.
Thank you
Upvotes: 1
Views: 364
Reputation: 10637
You can just shuffle the indices (sampling without replacement) and then get the first few for testing and the others for training.
indices <- sample(seq(20))
test <- indices[1:10]
train <- indices[11:20]
train
#> [1] 10 8 12 1 7 20 13 18 4 11
test
#> [1] 19 3 15 2 6 9 16 14 17 5
Created on 2021-09-09 by the reprex package (v2.0.0)
Upvotes: 1