Identical accuracy in different ML Classification models

Question

I used the "Stroke" data set from kaggle to compare the accuracy of the following different models of classification:

K-Nearest-Neighbor (KNN).
Decision Trees.
Adaboost.
Logistic Regression.

I did not implement the models myself, but used sklearn library's implementations. After training the models I ran the test data and printed the level of accuracy of each of the models and these are the results:

As you can see, KNN, Adaboost, and Logistic Regression gave me the exact same accuracy.

My question is, does it make sense that there is not even a small difference between them or did I make a mistake somewhere along the way (Even though I only used sklearn's implementations?

lejlot · Accepted Answer

In general achieving the same scores is unlikely, and the explanation is usually:

bug in actual reporting
bug in the data processing
score reported corresponds to a degenerate solution

And the last explanation is probably the case. Stroke dataset has 249 positive samples in 5000 datapoints, so if your model always says "no stroke" it will get roughly 95%. So my best guess is that all your models failed to learn anything and are just constantly outputting "0".

In general accuracy is not a right metric for highly imabalnced datasets. Consider balanced accuracy, f1, etc.

Identical accuracy in different ML Classification models

Answers (1)

Related Questions