Aaraeus
Aaraeus

Reputation: 1155

What's the difference between the score method on a fitted model, vs accuracy_score from scikit-learn?

I'd normally just post this to Stack Overflow, but I thought about it and realised it's not actually a coding question - it's an ML question.

Any other feedback on code or anything else is thoroughly appreciated and welcomed!

The Jupyter Notebook

So I'm doing the titanic problem on Kaggle. I have my four datasets ready to go:

With this in mind, I have two questions, though the second one is the important one.

Question 1: Is my understanding of the next step correct?

We fit our model on the training data, then we create a prediction (pred) which tries to predict based off our features_test data. This means that our pred and target_test datasets should in theory be the same (if the model worked perfectly).

This means that to attest to the accuracy of the model, we can simply compare the results between pred and target_test, which is what the accuracy_score function does from Sklearn.

Question 2: What's the difference between using the score method of the model, vs the accuracy_score function?

This is what's confusing me. You can see in cell 97, the first cell under the "Model 1" header, that I use:

clf.score(features_test, target_test)

This comes out with a result of

0.8609865470852018

However, later, I also use:

from sklearn.metrics import accuracy_score
print(accuracy_score(target_test, pred))

And this also results in

0.8609865470852018

How are both of these scores the same? Have I done something wrong? Or are both of these steps basically doing the same thing? How..? Is the score() property effectively creating a pred Dataframe and checking against that in the background?

Upvotes: 4

Views: 4745

Answers (1)

desertnaut
desertnaut

Reputation: 60399

For such issues, arguably your best friend is the documentation; quoting from scikit-learn docs on model evaluation:

There are 3 different APIs for evaluating the quality of a model’s predictions:

  • Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator’s documentation.
  • Scoring parameter: Model-evaluation tools using cross-validation (such as model_selection.cross_val_score and model_selection.GridSearchCV) rely on an internal scoring strategy. This is discussed in the section The scoring parameter: defining model evaluation rules.
  • Metric functions: The metrics module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on Classification metrics, Multilabel ranking metrics, Regression metrics and Clustering metrics.

In the docs of all 3 classifiers you are using in your code (logistic regression, random forest, and decision tree, there is the identical description:

score(X, y, sample_weight=None)
Returns the mean accuracy on the given test data and labels.

which answers your 2nd question for the specific models used.

Nevertheless, you should always check the docs before blindly trusting the score method coming with an estimator; in linear regression and desision tree regressor, for example, score returns the coefficient of determination R^2, which is practically never used by ML practitioners building predictive models (it is often used by statisticians building explanatory models, but that's another story).

BTW, I glimpsed briefly at the code you link to, and I saw that you compute metrics like MSE, MAE, and RMSE - keep in mind that these are regression metrics, and they are not meaningful in a classification setting, such as the one you face here (and in turn, accuracy is meaningless in regression settings)...

Upvotes: 6

Related Questions