Reputation: 596
My concern is that when I train a nnet, the class is of type factor, but when I do a prediction, I get a chr returned.
I have taken this example from another posting.
library(nnet)
library(C50)
library(caret)
attach(iris)
set.seed(3456)
trainIndex <- createDataPartition(iris$Species, p = .8,
list = FALSE,
times = 1)
irisTrain <- iris[ trainIndex,]
irisTest <- iris[-trainIndex,]
irispred <- nnet(Species ~ ., data=irisTrain, size=10)
predicted <- predict(irispred,irisTest,type="class")
and
> str(irisTrain)
'data.frame': 120 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.6 5 5.4 5 4.4 4.9 5.4 4.8 ...
$ Sepal.Width : num 3.5 3 3.1 3.6 3.9 3.4 2.9 3.1 3.7 3 ...
$ Petal.Length: num 1.4 1.4 1.5 1.4 1.7 1.5 1.4 1.5 1.5 1.4 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.1 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> str(irisTest)
'data.frame': 30 obs. of 5 variables:
$ Sepal.Length: num 4.7 4.6 4.8 4.3 5.4 4.6 5 5 4.6 5.3 ...
$ Sepal.Width : num 3.2 3.4 3.4 3 3.4 3.6 3.5 3.5 3.2 3.7 ...
$ Petal.Length: num 1.3 1.4 1.6 1.1 1.7 1 1.3 1.6 1.4 1.5 ...
$ Petal.Width : num 0.2 0.3 0.2 0.1 0.2 0.2 0.3 0.6 0.2 0.2 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
so in the training and test data set Species are factors, but
str(predicted)
chr [1:30] "setosa" "setosa" "setosa" "setosa" "setosa" ...
The results of the prediction are character. I am using other data mining packages, eg C50, and they return factors from the prediction,
> irispred <- C5.0(Species ~ ., data=irisTrain)
> predicted <- predict(irispred,irisTest,type="class")
> str(predicted)
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 2 1 1 ...
I would prefer a consistent, factor based, format for the output of predict. Converting the character output of predict in the case of nnet to factors would not work since I can not guarantee that all the levels will be present as character variables. For example of my 650 cases there is one case with a unique level and this may sometimes be in the testing data set and sometimes not, but I want the output from predict to know about it even if it is not in the testing data.
Thanks.
Upvotes: 1
Views: 915
Reputation: 12640
Having played around with nnet.formula
, it stores the levels of the class in the lev
member of its result. The ordering of levels is preserved from the input data, even if a level is not included in the training set. The predicted class can easily be made back into a factor by using factor(predicted_class, levels = model_object$lev)
. For example:
iris2 <- iris
iris2$Species <- factor(iris2$Species,
levels = c("versicolor", "banana", "setosa", "cherry", "virginica"))
iris_pred <- nnet(Species ~ ., data = iris2[trainIndex, ], size = 10)
#Warning message:
#In nnet.formula(Species ~ ., data = iris2[trainIndex, ], size = 10) :
# groups ‘banana’ ‘cherry’ are empty
identical(iris_pred$lev, levels(iris2$Species))
#[1] TRUE
predicted <- predict(iris_pred, iris2[-trainIndex, ], type="class")
predicted_fac <- factor(predicted, levels = iris_pred$lev)
table(iris2[-trainIndex, "Species"], predicted_fac)
# predicted_fac
# versicolor banana setosa cherry virginica
# versicolor 10 0 0 0 0
# banana 0 0 0 0 0
# setosa 0 0 10 0 0
# cherry 0 0 0 0 0
# virginica 0 0 0 0 10
Upvotes: 1