Reputation: 41
I'm trying to take a dataset and partition it into 3 pieces: training: 60%, testing: 20%, and validation: 20%.
part1 <- createDataPartition(fullDataSet$classe, p=0.8, list=FALSE)
validation <- fullDataSet[-part1,]
workingSet <- fullDataSet[part1,]
When I do the same thing to partition again:
inTrain <- createDataPartition(workingSet$classe, p=.75, list=FALSE)
I get the error:
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Is there a way to either a) create 3 partitions of different sizes or b) do a nested partition like what I tried to do? I've considered c) use sample() instead, but it's for a class in which the instructor only uses createDataPartition, and we have to show our code. Does anyone have any advice here?
Upvotes: 4
Views: 4717
Reputation: 1
#METHOD 1 : EQUAL SPLITS
# allind <- sample(1:nrow(m.d),nrow(m.d))
# #split in three parts
# trainind <- allind[1:round(length(allind)/3)]
# valind <- allind[(round(length(allind)/3)+1):round(length(allind)*(2/3))]
# testind <- allind[round(length(allind)*(2/3)+1):length(allind)]
set.seed(1234)
#METHOD 2 : 60-30-20 SPLIT
allind <- sample(1:nrow(m.d),nrow(m.d))
trainind <- allind[1:round(length(allind)*0.6)]
valind <- allind[(round(length(allind)*0.6)+1):((round(length(allind)*0.6)+1)+
(round(length(allind)*0.3)))]
testind <- allind[((round(length(allind)*0.6)+1)+
(round(length(allind)*0.3))+1):length(allind)]
m.dTRAIN <- m.d[trainind,]
m.dVAL <- m.d[valind,]
m.dTEST <- m.d[testind,]
Upvotes: 0
Reputation: 272
actually I was wondering the same and I came up with a non-very-elegant solution but that seems to work.
So, in my case I wanted to create a training dataset with 60% of the data and test and validation datasets with 20% each. Here is how I've done it:
set.seed(1234)
inTraining <- createDataPartition(mydata$FLAG, p=0.6, list=FALSE)
training.set <- mydata[inTraining,]
Totalvalidation.set <- mydata[-inTraining,]
# This will create another partition of the 40% of the data, so 20%-testing and 20%-validation
inValidation <- createDataPartition(Totalvalidation.set$FLAG, p=0.5, list=FALSE)
testing.set <- Totalvalidation.set[inValidation,]
validation.set <- Totalvalidation.set[-inValidation,]
It looks like it gives me the right datasets and will be testing them today. Hope it works for you and if somebody else has a more elegant answer please share! :)
Upvotes: 5