Marcos Grzeça
Marcos Grzeça

Reputation: 85

R caret naïve bayes accuracy is null

I have one dataset to train with SVM and Naïve Bayes. SVM works, but Naïve Bayes doesn't work. Follow de source code below:

library(tools)
library(caret)
library(doMC)
library(mlbench)
library(magrittr)
library(caret)

CORES <- 5 #Optional
registerDoMC(CORES) #Optional

load("chat/rdas/2gram-entidades-erro.Rda")

set.seed(10)
split=0.60

maFinal$resposta <- as.factor(maFinal$resposta)
data_train <- as.data.frame(unclass(maFinal[ trainIndex,]))
data_test <- maFinal[-trainIndex,]

treegram25NotNull <- train(x = subset(data_train, select = -c(resposta)),
      y = data_train$resposta, 
      method = "nb",
      trControl = trainControl(method = "cv", number = 5, savePred=T, sampling = "up"))

treegram25NotNull

The final accuracy is null yy

Warning messages: 1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. 2: In train.default(subset(data_train, select = -c(resposta)), data_train$resposta, : missing values found in aggregated results

Any help would be greatly appreciated, thanks.

Upvotes: 1

Views: 347

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48211

The fix is really simple:

set.seed(10)
split <- 0.60
maFinal[] <- lapply(maFinal, as.factor)

Currently all your variables, except for resposta, are numeric. However, they have only up to 12~ distinct values, meaning that they all actually should be factor variables. Also, many of them are highly unbalanced. Then, when splitting the sample, the issue arises from treating (actually factor) variables with only a single unique value as continuous variables.

Upvotes: 1

Related Questions