narayananv10
narayananv10

Reputation: 25

Confusion regarding the accuracy of my model

I'm facing this issue in my college project which is a clickbait news classifier, i.e., it classifies between clickbait headlines and non-clickbait headlines. I'm using a dataset which has 16000 headlines of each type. Now, the main issue I'm facing is on training my network on 70% data, with 30% as my test set size. My validation set is 30% of the training set. But after fitting and evaluating the model on my test set, and I get this

After fitting:

Epoch 10/10 131/131 [==============================] - 1s 6ms/step - loss: 0.2098 - accuracy: 0.9457 - val_loss: 0.3263 - val_accuracy: 0.9417

After evaluating on test set:

300/300 [==============================] - 1s 2ms/step - loss: 0.3030 - accuracy: 0.9432

Confusion Matrix:

array([[4638, 162], [ 383, 4417]])

Now I'm very new to neural networks and i'm not sure if these accuracies are supposed to be this similar to each other. Is this something that I should be concerned about or am I missing something? I appreciate all the help I can get... Thanks!

Upvotes: 0

Views: 40

Answers (2)

jmauricio
jmauricio

Reputation: 115

Your question is not exactly about coding, I would advise you to use https://stats.stackexchange.com/ next time.

Anwsering: You did everything right and your results shows that your model can generalize well for new samples. I would check the % of click-baits in all sets just to be sure, but you did everything fine and got a good model in the end, congratulations.

If you're still unsure of your results: you could change the strategy for splitting your data between train, test and validation set. For instance you could check if your data set has some sort of "date of publication" and use it:

  • Use the first news as your train dataset
  • For your validation set you could either: pick the latest samples from your training set OR use an intermediate time between train and test set
  • The last news you use as a test dataset

This way you would be sure that your model can classify correctly all future click-baits

Upvotes: 1

Gerry P
Gerry P

Reputation: 8092

your results look fine. Your test set and validation set have probability distributions which must be a close match to each other consequently the similarity in the validation and test data accuracies.

Upvotes: 2

Related Questions