Reputation: 7089
I recently looked at a bunch of sklearn tutorials, which were all similar in that they scored the goodness of fit by:
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
And it'll spit out:
0.92345...
or some other score.
I am curious as to the parameters of the clf.score function or how it scores the model. I looked all over the internet, but can't seem to find documentation for it. Does anyone know?
Upvotes: 29
Views: 79194
Reputation: 11
Scikit-learns model.score(X,y) calculation works on co-efficient of determination i.e R^2 is a simple function that takes model.score= (X_test,y_test). It doesn't require y_predicted value to be supplied externally to calculate the score for you, rather it calculates y_predicted internally and uses it in the calculations.
This is how it is done:
u = ((y_test - y_predicted) ** 2).sum()
v = ((y_test - y_test.mean()) ** 2).sum()
score = 1 - (u/v)
and you get the score ! Hope that helps.
Upvotes: 1
Reputation: 133
Here is the way the score is calculated for Regressor:
score(self, X, y, sample_weight=None)[source] Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((ytrue - ypred) ** 2).sum() and v is the total sum of squares ((ytrue - ytrue.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
From sklearn documentation.
Upvotes: 5
Reputation: 1
Syntax: sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
Parameters: y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) labels.
y_pred: 1d array-like, or label indicator array / sparse matrix Predicted labels, as returned by a classifier.
normalize : bool, optional (default=True) If False, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.
sample_weight : array-like of shape = [n_samples], optional Sample weights.
Returns:
score : float
If normalize == True, return the fraction of correctly classified samples (float), else returns the number of correctly classified samples (int).
The best performance is 1 with normalize == True and the number of samples with normalize == False.
For more information you can refer to: [https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score][1]
Upvotes: 0
Reputation: 222471
This is classifier dependent. Each classifier provides it's own scoring function.
Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator’s documentation.
Apart from the documentation given to you in one of the answers, the only additional thing you can do is to read what kind of parameters your estimator provides. For example SVM classifier SVC has the following parameters score(X, y, sample_weight=None)
Upvotes: 2
Reputation: 363507
It takes a feature matrix X_test
and the expected target values y_test
. Predictions for X_test
are compared with y_test
and either accuracy (for classifiers) or R² score (for regression estimators is returned.
This is stated very explicitly in the docstrings for score
methods. The one for classification reads
Returns the mean accuracy on the given test data and labels.
Parameters
----------
X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples,)
True labels for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns
-------
score : float
Mean accuracy of self.predict(X) wrt. y.
and the one for regression is similar.
Upvotes: 31
Reputation: 32094
Not sure that I understood your question correctly. Obviously, to compute some error or similarity most scoring functions receive an array of reference values (y_true
) and an array of values predicted by your model (y_score
) as main parameters, but may also receive some other parameters, specific for the metric. Scoring functions usually do not need X values.
I would suggest look into the source code of the scoring functions to understand how they work.
Here is a list of scoring functions in scikit-learn.
Upvotes: 2