Reputation: 25
I'm facing this issue in my college project which is a clickbait news classifier, i.e., it classifies between clickbait headlines and non-clickbait headlines. I'm using a dataset which has 16000 headlines of each type. Now, the main issue I'm facing is on training my network on 70% data, with 30% as my test set size. My validation set is 30% of the training set. But after fitting and evaluating the model on my test set, and I get this
After fitting:
Epoch 10/10 131/131 [==============================] - 1s 6ms/step - loss: 0.2098 - accuracy: 0.9457 - val_loss: 0.3263 - val_accuracy: 0.9417
After evaluating on test set:
300/300 [==============================] - 1s 2ms/step - loss: 0.3030 - accuracy: 0.9432
Confusion Matrix:
array([[4638, 162], [ 383, 4417]])
Now I'm very new to neural networks and i'm not sure if these accuracies are supposed to be this similar to each other. Is this something that I should be concerned about or am I missing something? I appreciate all the help I can get... Thanks!
Upvotes: 0
Views: 40
Reputation: 115
Your question is not exactly about coding, I would advise you to use https://stats.stackexchange.com/ next time.
Anwsering: You did everything right and your results shows that your model can generalize well for new samples. I would check the % of click-baits in all sets just to be sure, but you did everything fine and got a good model in the end, congratulations.
If you're still unsure of your results: you could change the strategy for splitting your data between train, test and validation set. For instance you could check if your data set has some sort of "date of publication" and use it:
This way you would be sure that your model can classify correctly all future click-baits
Upvotes: 1
Reputation: 8092
your results look fine. Your test set and validation set have probability distributions which must be a close match to each other consequently the similarity in the validation and test data accuracies.
Upvotes: 2