Reputation: 105
I'm trying to classify text using naive bayes classifier, and also want to use k-fold cross validation to validate the result of classification. But I'm still confused how to use the k-fold cross validation. As i know that k-fold divide data to k subsets, then one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. And i think as training set the data must have label to be trained. So to use k-fold cross validation the required data is the labeled data?, is it right?, and how about non labeled data?.
Upvotes: 0
Views: 2094
Reputation: 64
for non labeled data you must use clustering methods, for nb maybe this code would help you:
[testF, trainF] = kfolds(Features,k);
[testL, trainL] = kfolds(Label,k);
c = size(Features);
for i=1:k
LabelTrain = trainL{i};
LabelTest = testL{i};
FeaturesTrain = trainF{i};
FeaturesTest = testF{i};
nb = NaiveBayes.fit(FeaturesTrain,LabelTrain);
Class = predict(nb,FeaturesTest);
predict_Class(i)=sum(Class==LabelTest);
end
predict_all = sum(predict_Class)/c(1);
kfolds function would separate your data to k folds.
cheers
Upvotes: 0