Mahsolid
Mahsolid

Reputation: 433

Create data partition into training, testing and validation - split in R

I wanted to split my training data in to 70% training, 15% testing and 15% validation. I am using the createDataPartition() function of the caret package. I am splitting it like the following

train <- read.csv("Train.csv")
test <- read.csv("Test.csv")

split=0.70
trainIndex <- createDataPartition(train$age, p=split, list=FALSE)
data_train <- train[ trainIndex,]
data_test <- train[-trainIndex,]

Is there any way of splitting into training, testing and validation using createDataPartition() like the following H2o approach?

data.hex <- h2o.importFile("Train.csv")
splits <- h2o.splitFrame(data.hex, c(0.7,0.15), destination_frames = c("train","valid","test"))
train.hex <- splits[[1]]
valid.hex <- splits[[2]]
test.hex  <- splits[[3]]

Upvotes: 4

Views: 19980

Answers (2)

Perceptron
Perceptron

Reputation: 379

Take a look at train,validation, test split model in CARET in R. The idea is to use createDataPartition() twice. First time p=0.7 to create 70% train and 30% remaining data. Second time p=0.5 on remaining data to create 15% testing and 15% validate.

Upvotes: 0

lmo
lmo

Reputation: 38520

A method using the sample() function in base R is

splitSample <- sample(1:3, size=nrow(data.hex), prob=c(0.7,0.15,0.15), replace = TRUE)
train.hex <- data.hex[splitSample==1,]
valid.hex <- data.hex[splitSample==2,]
test.hex <- data.hex[splitSample==3,]

Upvotes: 9

Related Questions