Stephen Clark
Stephen Clark

Reputation: 596

The predict from a nnet is a character and not a factor

My concern is that when I train a nnet, the class is of type factor, but when I do a prediction, I get a chr returned.

I have taken this example from another posting.

library(nnet)
library(C50)
library(caret)
attach(iris)
set.seed(3456)
trainIndex <- createDataPartition(iris$Species, p = .8,
                          list = FALSE,
                          times = 1)
irisTrain <- iris[ trainIndex,]
irisTest  <- iris[-trainIndex,]

irispred <- nnet(Species ~ ., data=irisTrain, size=10)
predicted <- predict(irispred,irisTest,type="class")

and

> str(irisTrain)
'data.frame':   120 obs. of  5 variables:
$ Sepal.Length: num  5.1 4.9 4.6 5 5.4 5 4.4 4.9 5.4 4.8 ...
$ Sepal.Width : num  3.5 3 3.1 3.6 3.9 3.4 2.9 3.1 3.7 3 ...
$ Petal.Length: num  1.4 1.4 1.5 1.4 1.7 1.5 1.4 1.5 1.5 1.4 ...
$ Petal.Width : num  0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.1 0.2 0.1 ...
$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> str(irisTest)
'data.frame':   30 obs. of  5 variables:
 $ Sepal.Length: num  4.7 4.6 4.8 4.3 5.4 4.6 5 5 4.6 5.3 ...
 $ Sepal.Width : num  3.2 3.4 3.4 3 3.4 3.6 3.5 3.5 3.2 3.7 ...
 $ Petal.Length: num  1.3 1.4 1.6 1.1 1.7 1 1.3 1.6 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.3 0.2 0.1 0.2 0.2 0.3 0.6 0.2 0.2 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

so in the training and test data set Species are factors, but

str(predicted)
chr [1:30] "setosa" "setosa" "setosa" "setosa" "setosa" ...

The results of the prediction are character. I am using other data mining packages, eg C50, and they return factors from the prediction,

> irispred <- C5.0(Species ~ ., data=irisTrain)
> predicted <- predict(irispred,irisTest,type="class")
> str(predicted)
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 2 1 1 ...

I would prefer a consistent, factor based, format for the output of predict. Converting the character output of predict in the case of nnet to factors would not work since I can not guarantee that all the levels will be present as character variables. For example of my 650 cases there is one case with a unique level and this may sometimes be in the testing data set and sometimes not, but I want the output from predict to know about it even if it is not in the testing data.

Thanks.

Upvotes: 1

Views: 915

Answers (1)

Nick Kennedy
Nick Kennedy

Reputation: 12640

Having played around with nnet.formula, it stores the levels of the class in the lev member of its result. The ordering of levels is preserved from the input data, even if a level is not included in the training set. The predicted class can easily be made back into a factor by using factor(predicted_class, levels = model_object$lev). For example:

iris2 <- iris
iris2$Species <- factor(iris2$Species,
  levels = c("versicolor", "banana", "setosa", "cherry", "virginica"))
iris_pred <- nnet(Species ~ ., data = iris2[trainIndex, ], size = 10)

#Warning message:
#In nnet.formula(Species ~ ., data = iris2[trainIndex, ], size = 10) :
#  groups ‘banana’ ‘cherry’ are empty

identical(iris_pred$lev, levels(iris2$Species))
#[1] TRUE

predicted <- predict(iris_pred, iris2[-trainIndex, ], type="class")
predicted_fac <- factor(predicted, levels = iris_pred$lev)
table(iris2[-trainIndex, "Species"], predicted_fac)

#            predicted_fac
#             versicolor banana setosa cherry virginica
#  versicolor         10      0      0      0         0
#  banana              0      0      0      0         0
#  setosa              0      0     10      0         0
#  cherry              0      0      0      0         0
#  virginica           0      0      0      0        10

Upvotes: 1

Related Questions