karmabob
karmabob

Reputation: 105

Class probabilities in Neural networks

I use the caret package with multi-layer perception.

My dataset consists of a labelled output value, which can be either A,B or C. The input vector consists of 4 variables.

I use the following lines of code to calculate the class probabilities for each input value:

fit <- train(device~.,data=dataframetrain[1:100,], method="mlp",
             trControl=trainControl(classProbs=TRUE))
(p=(predict(fit,newdata=dataframetest,type=("prob"))))

I thought that the class probabilities for each record must sum up to one. But I get the following:

rowSums(p)
#        1        2        3        4        5        6        7        8 
# 1.015291 1.015265 1.015291 1.015291 1.015291 1.014933 1.015011 1.015291 
#        9       10       11       12       13       14       15       16 
# 1.014933 1.015206 1.015291 1.015291 1.015291 1.015224 1.015011 1.015291 

Can anybody help me because I don't know what I did wrong.

Upvotes: 0

Views: 2922

Answers (2)

cfh
cfh

Reputation: 4666

I don't know how much flexibility the caret package offers in these choices, but the standard way to make a neural net produce outputs which sum to one is to use the softmax function as the activation function in the output layer.

Upvotes: 0

thie1e
thie1e

Reputation: 3688

There's probably nothing wrong, it just seems that caret returns the values of the neurons in the output layer without converting them to probabilities (correct me if I'm wrong). When using the RSNNS::mlp function outside of caret the rows of the predictions also don't sum to one.

Since all output neurons have the same activation function the outputs can be converted to probabilities by dividing the predictions by the respective row sum, see this question.

This behavior seems to be true when using method = "mlp" or method = "mlpWeightDecay" but when using method = "nnet" the predictions do sum to one.

Example:

library(RSNNS)

data(iris)
#shuffle the vector
iris <- iris[sample(1:nrow(iris),length(1:nrow(iris))),1:ncol(iris)]
irisValues <- iris[,1:4]
irisTargets <- iris[,5]
irisTargetsDecoded <- decodeClassLabels(irisTargets)
iris2 <- splitForTrainingAndTest(irisValues, irisTargetsDecoded, ratio=0.15)
iris2 <- normTrainingAndTestSet(iris2)

set.seed(432)
model <- mlp(iris2$inputsTrain, iris2$targetsTrain, 
             size=5, learnFuncParams=c(0.1), maxit=50, 
             inputsTest=iris2$inputsTest, targetsTest=iris2$targetsTest)

predictions <- predict(model,iris2$inputsTest)
head(rowSums(predictions))
# 139        26        17       104        54        82 
# 1.0227419 1.0770722 1.0642565 1.0764587 0.9952268 0.9988647 

probs <- predictions / rowSums(predictions)
head(rowSums(probs))
# 139  26  17 104  54  82 
# 1   1   1   1   1   1 

# nnet example --------------------------------------
library(caret)
training <- sample(seq_along(irisTargets), size = 100, replace = F)
modelCaret <- train(y = irisTargets[training], 
                    x = irisValues[training, ],
                    method = "nnet")
predictionsCaret <- predict(modelCaret, 
                            newdata = irisValues[-training, ],
                            type = "prob")
head(rowSums(predictionsCaret))
# 122 100  89 134  30  86 
# 1   1   1   1   1   1 

Upvotes: 1

Related Questions