Reputation: 99
I want to use train and test in J48 decision-tree on R. here is my code:
library("RWeka")
data <- read.csv("try.csv")
resultJ48 <- J48(classificationTry~., data)
summary(resultJ48)
but I want to split my data into 70% train and 30% test, how can I use the J48 algo to do it?
many thanks!
Upvotes: 2
Views: 10824
Reputation: 3436
If you don't want to use more packages other than RWeka, you can do it with runif:
library("RWeka")
data <- read.csv("try.csv")
randoms=runif(nrow(data))
resultJ48 <- J48(classificationTry~., data[randoms<=0.7,])
PredTest <- predict(resultJ48, newdata = data[randoms>0.7,])
table(data[randoms>0.7,]$classificationTry, PredTest)
Upvotes: 0
Reputation: 9285
use the sample.split()
function of the caTools
package. It is more leightweight than the caret
package (which is a meta package if I remember correctly):
library(caTools)
library(RWeka)
data <- read.csv("try.csv")
spl = sample.split(data$someAttribute, SplitRatio = 0.7)
dataTrain = subset(data, spl==TRUE)
dataTest = subset(data, spl==FALSE)
resultJ48 <- J48(as.factor(classAttribute)~., dataTrain)
dataTest.pred <- predict(resultJ48, newdata = dataTest)
table(dataTest$classAttribute, dataTest.pred)
Upvotes: 4
Reputation: 284
It is not in R. But in java... But you will understand the logic with it.
int trainSize = (int) Math.round(trainingSet.numInstances() * 0.7); //70% split
int testSize = trainingSet.numInstances() - trainSize;
Instances train = new Instances(trainingSet, 0, trainSize);
Instances test = new Instances(trainingSet, trainSize, testSize)
Implement in R with same logic. Hope it helps :)
Upvotes: 1