Ari
Ari

Reputation: 125

R: Why doesn't the negative index work to create the complement set?

I am trying to create training, validation, and test data sets. (Before I filter the dataframe into the appropriate data sets, I am trying to create vectors with the list of rows that each data set will contain.
There are 654 observations, and I intend to place 354 in training, and 200 Validation, 100 test. Here is the code I used:

x <- 1:654
train_ind <- sample(x, 354)
rest <- x[-train_ind]
length(rest)
[1] 300
valid <- sample(rest, 200)
length(valid)
[1] 200
test <- rest[-valid]
length(test)
[1] 210

I don't understand why the test object is 210!
I would think that since valid is only length 200, that if I take rest (300) and negative index the valid, then I will only be left with 100.
I appreciate any input into what I'm doing wrong.
Thank you

Upvotes: 1

Views: 364

Answers (1)

danlooo
danlooo

Reputation: 10637

You can just shuffle the indices (sampling without replacement) and then get the first few for testing and the others for training.

indices <- sample(seq(20))
test <- indices[1:10]
train <- indices[11:20]

train
#>  [1] 10  8 12  1  7 20 13 18  4 11
test
#>  [1] 19  3 15  2  6  9 16 14 17  5

Created on 2021-09-09 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions