Sayan Ray
Sayan Ray

Reputation: 13

MLP model using Keras package in R fails to learn (High training and val_accuracy but very poor performance on test data)

UPDATE To help someone who is looking for similar answers to this question, I was able to increase AUC by balancing the dataset. I dis this by doing the following edit to the code:


    history <- model %>%
                  fit(train_nn, 
                      train_target, #when using OHE becomes train_label
                      epoch = 100,
                      batch_size = 32,
                      validation_split = 0.10, 
                      class_weight = list("0" = 
    nrow(dataset[dataset[,134] ==1)/nrow(dataset[dataset[,134] ==0, "1" = 
    1)

End of Update

I am currently studying biases in predictions of neural network models. Using data from the fintech company Bondora, I am attempting to create an MLP model to predict loan acceptance. The dataset contains multiple categorical and numerical variables. I created a categorical variable called "reject_loan" (serves as my target variable) which is 1 if a loan defaults within 1 year of origination and 0 otherwise. Now I am attempting to create a MLP model to predict "reject_loan".

Problem: Even though accuracy and validation accuracy both are high (around 83% and around 90% respectively)loss, val_loss, acc, val-accuracy, predictions on test data are very poor. The model usually predicts only one class for all observations OR is able to make only very few correct predictions of the other class. AUC hovers close to 50% always.

I have tried a variety of approaches in pre-processing and in model parameters. Some of the major approaches are below:

  1. Using OHE for all categorical variables (including target), normalizing the numerical vars and then using relu activation for hidden, softmax for output and categorical cross entrophy as loss function)
  2. No OHE, normalizing the numerical vars and then using relu activation for hidden, sigmoid for output and binary cross entrophy as loss function)
  3. Using elu activation for hidden to ensure no leaks in relu
  4. Using multiple hidden layers with and without regularizer (l1 and l2)
  5. Using dropouts
  6. Using SGD and ADAM as optimizers (i.e. either SGD or ADAM)
  7. Decreasing learning rate (lowest used is 0.000001)

Nothing has worked to increase predictability. I should also mention that I have trained an XGBoost model on the same dataset with AUC of around 90% ROC curve and AUC of one of the runs.

Would very much appreciate if someone can help me with this issue.

My model code is as under:

#divide into train and test
set.seed (1234)
#dividing cons in 80:20 train:test sample
sample <- sample(2, nrow(dataset), replace = T, prob = c(0.80,0.20))
train <- dataset[sample==1,1:ncol(dataset)-1]
test <- dataset[sample==2,1:ncol(dataset)-1]
train_target <- dataset[sample==1, ncol(dataset)]
test_target <- dataset[sample==2, ncol(dataset)]

#One hot encoding
train_label <- to_categorical(train_target)
test_label <- to_categorical(test_target)


#Create sequential model
model <- keras_model_sequential()
model %>%
  layer_dense(units = 16,
              activation = 'elu',
              input_shape = c(ncol(train_nn)),
              kernel_regularizer = regularizer_l1_l2(l1 = 0.2, l2 = 0.2)) %>% 
  layer_dropout(0.2) %>% 
  layer_dense(units = 8,
              activation = 'elu',
              kernel_regularizer = regularizer_l1_l2(l1 = 0.2, l2 = 0.2)) %>%
  layer_dropout(0.4) %>% 
  layer_dense(units = 8,
             activation = 'elu') %>%
  layer_dense(units = 1,
              activation = 'sigmoid' #In this iteration sigmoid is used but I have also used softmax with a OHE coding of target and units = 2)
              
#compile
opt = optimizer_sgd(lr = 0.001,
                    momentum = 0,
                    decay = 0,
                    nesterov = FALSE,
                    clipnorm = NULL,
                    clipvalue = NULL
                        )

opt2 = optimizer_adam(lr = 0.000001,
                      beta_1 = 0.9,
                      beta_2 = 0.999,
                      epsilon = NULL,
                      decay = 0,
                      amsgrad = FALSE,
                      clipnorm = NULL,
                      clipvalue = NULL
                      )


model %>% 
  compile(loss = 'binary_crossentropy', #also used categorical_crossontrophy in some iterations
          optimizer = opt2,
          metrics = 'accuracy')

#Fit model
clbck = callback_reduce_lr_on_plateau(monitor='val_loss', factor=0.1, patience=2)
history <- model %>%
              fit(train_nn, 
                  train_target, #when using OHE becomes train_label
                  epoch = 100,
                  batch_size = 32,
                  validation_split = 0.10)

#Evaluate model with test data
nn_model_3 <- model %>% evaluate(test, test_target) #When using OHE this becomes test_label

#Prediction & confusion matrix - test data
prob <- model %>% 
          predict_proba(test)

pred <- model %>% 
          predict_classes(test)

nn_conf_table_3 <- table(Predicted = pred, Actual = test_target)

nn_probability_table_3 <- cbind (prob, pred, test_target)

#auc AND roc
par(pty = "s")
nn_roc_3 <- roc(test_target, pred, plot=T, percent=T, lwd = 3, print.auc=T)

Upvotes: 0

Views: 226

Answers (0)

Related Questions