user3785898
user3785898

Reputation: 35

How to use Rs neuralnet package in a Kaggle competition about Titanic

I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package.

This is the train data from the website:

train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )
head(m)

Here I train the neural network, depending on who survived. I want to see if I can predict who survived:

library(neuralnet)

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
data=m, hidden=10, threshold=0.01,rep=100)

The net is trained. I load the test data and prepare it for test.

test=read.csv("test.csv")

m2 <- model.matrix(  ~  Pclass + Sex + Age + SibSp, data = test )

The final test for prediction:

res= compute(r, m2)

First, I do not know many hidden neurons I should take. Sometimes it takes to long, and when I succeed I cannot make the test with the test data because an error occurs which says the two data set are not compatible:

res= compute(r, m2)

Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments

What am I doing wrong here?

The whole code:

train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )
head(m)

library(neuralnet)

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
data=m, hidden=10, threshold=0.01,rep=100)

test=read.csv("test.csv")

m2 <- model.matrix(  ~  Pclass + Sex + Age + SibSp, data = test )

res= compute(r, m2)

Upvotes: 3

Views: 1711

Answers (1)

chappers
chappers

Reputation: 2415

Try using this to predict instead:

res = compute(r, m2[,c("Pclass", "Sexmale", "Age", "SibSp")])

That worked for me and you should get some output.

What appears to have happend: model.matrix creates additional columns ((Intercept)) which isn't part of the data which was used to build the neural net, as such in the compute function it doesn't know what do with it. So the solution is to select explicitly the columns needed to use in the compute function. This is because neuralnet tries to do some kind of matrix multiplication, but the matrix is of the wrong size.


For how many neurons, or optimizing hyper-parameters, you could use Cross-validation and all those other methods. If using a different package (nnet) is fine then you can use the caret package to determine the optimal parameters for you. It would look like this:

library(caret)
nnet.model <- train(Survived ~ Pclass + Sex + Age + SibSp, 
                    data=train, method="nnet")
plot(nnet.model)
res2 = predict(nnet.model, newdata=test)

with the plot of the hyperparameters being this:

enter image description here


You can measure performance using the confusionMatrix in the caret package:

library(neuralnet)
library(caret)
library(dplyr)
train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
                data=m, rep=20)

res = neuralnet::compute(r, m[,c("Pclass", "Sexmale", "Age", "SibSp")])
pred_train = round(res$net.result)

# filter only with the ones with a survival prediction, not all records
# were predicted for some reason;
pred_rowid <- as.numeric(row.names(pred_train))
train_survived <- train %>% filter(row_number(Survived) %in% pred_rowid) %>% select(Survived)
confusionMatrix(as.factor(train_survived$Survived), as.factor(pred_train))

Output:

Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 308 128
         1 164 114

               Accuracy : 0.5910364             
                 95% CI : (0.5539594, 0.6273581)
    No Information Rate : 0.6610644             
    P-Value [Acc > NIR] : 0.99995895            

                  Kappa : 0.119293              
 Mcnemar's Test P-Value : 0.04053844            

            Sensitivity : 0.6525424             
            Specificity : 0.4710744             
         Pos Pred Value : 0.7064220             
         Neg Pred Value : 0.4100719             
             Prevalence : 0.6610644             
         Detection Rate : 0.4313725             
   Detection Prevalence : 0.6106443             
      Balanced Accuracy : 0.5618084             

       'Positive' Class : 0    

Upvotes: 3

Related Questions