kruparulz14
kruparulz14

Reputation: 163

Different training accuracy for different models but same testing accuracy

I am working to develop a deep learning classifier- 2 classes. The dataset I am working with is imbalanced. I did down sampling to resolve the same. I then create a small sample of data of both classes and create a Deep Learning model as follows:

dl_model = Sequential()

n_cols = X_train.shape[1]

dl_model.add(Dense(1024, activation='relu', input_shape=(n_cols,)))
dl_model.add(Dense(512, activation='relu'))
dl_model.add(Dense(256, activation='relu'))
dl_model.add(Dense(256, activation='relu'))
dl_model.add(Dense(128, activation='relu'))
dl_model.add(Dense(64, activation='relu'))
dl_model.add(Dense(2, activation='softmax'))

adam= optimizers.Adam(lr=0.001)

dl_model.compile(optimizer=adam, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

early_stopping_monitor = EarlyStopping(patience=3)

dl_model.fit(X_train, y_train, epochs=10, validation_split=0.2, batch_size=1000,callbacks=[early_stopping_monitor], shuffle=True)

model_json = dl_model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)

dl_model.save_weights("model.h5")

For different hyperparameter tuning, I get results like:

Model 1 - train_loss: 7.7971 - train_acc: 0.5160 - val_loss: 9.6992 - val_acc: 0.3982

Model 2 - train_loss: 2.8257 - train_acc: 0.8201 - val_loss: 2.9312 - val_acc: 0.8160

Model 3 - train_loss: 3.1887 - train_acc: 0.8002 - val_loss: 3.5195 - val_acc: 0.7808

I save each of these models and then load it in a different file where I apply the model to the whole dataset and calculate the metrics as follows:

sc = MinMaxScaler()
X = sc.fit_transform(X)

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
loaded_model.load_weights("model.h5")

loaded_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
score = loaded_model.evaluate(X, y, verbose=0)
print("Deep learning accuracy %s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))

All the above 3 models give the same accuracy. Even the same confusion matrix. What could be the reason? Shouldn't the 3 models give different results as they have different training accuracy/metrics?

Update:

When loading any of the models, I get he same accuracy of 97.82% and confusion matrix as :

[[143369      0]

 [  2958      0]]

Upvotes: 0

Views: 1497

Answers (1)

Timbus Calin
Timbus Calin

Reputation: 14983

The problem that you have here is that all the neural networks that you have trained are not able to properly learn the second class, the less well-represented one.

The accuracy that you have on the test set is the same due to the fact that neither model_1 or model_2 or model_3 are able to distinguish class 1 from class 2 and thus all three of them know to recognise class 1, but fail to recognise class 2. In other words, when you test on your test set, the results are the same, regardless of the differences that you see during training.

This observation can be easily inferred from the confusion matrix that you displayed there.

Assume you do not know the above observation. Let us do some simple math:

  • 143369 + 2958 = 146327.
  • (143369/146327) * 100 = 97.97% (which is a bit bigger than your reported accuracy, but in the same ballpark -- the minor difference stems from evaluate_score in keras)

You can also infer from this(not only visually seeing that you have no TP(true positives) for class 2) that your have a problem.

Let us proceed now to tackle this issue!

Since we have mentioned this observation, you have to do the following for tackling this issue(or some of them combined):

First of all, start with a lower learning rate (0.0001 is a much better starting choice).

Second of all, consult the following procedures in order to obtain a good model:

  1. Remove the EarlyStopping(patience=3).
  2. Save your best model according to a different metric than accuracy (F1-Score for example)
  3. Reduce the learning_rate while training(ReduceLROnPlateau). You can use the following callback which is much more suitable in your case than the EarlyStopping : https://keras.io/callbacks/#reducelronplateau
  4. Use dataset enrichment. The best way to tackle imbalanced datasets is to use oversampling. Rather than under-sampling the well represented class and thus reducing the variance of your dataset, you can balance the support of classes by adding more examples in your minority class.

Upvotes: 2

Related Questions