Reputation: 4063
I am trying to evaluate a relevance of features and I am using DecisionTreeRegressor()
The related part of the code is presented below:
# TODO: Make a copy of the DataFrame, using the 'drop' function to drop the given feature
new_data = data.drop(['Frozen'], axis = 1)
# TODO: Split the data into training and testing sets(0.25) using the given feature as the target
# TODO: Set a random state.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(new_data, data['Frozen'], test_size = 0.25, random_state = 1)
# TODO: Create a decision tree regressor and fit it to the training set
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state=1)
regressor.fit(X_train, y_train)
# TODO: Report the score of the prediction using the testing set
from sklearn.model_selection import cross_val_score
#score = cross_val_score(regressor, X_test, y_test)
score = regressor.score(X_test, y_test)
print score # python 2.x
When I run the print
function, it returns the given score:
-0.649574327334
You can find the score function implementatioin and some explanation below here and below:
Returns the coefficient of determination R^2 of the prediction. ... The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).
I could not grasp the whole concept yet, so this explanation is not very helpful for me. For instance I could not understand why score could be negative and what exactly it indicates (if something is squared, I would expect it can only be positive).
What does this score indicates and why can it be negative?
If you know any article (for starters) it might be helpful as well!
Upvotes: 5
Views: 18155
Reputation: 51
R^2
can be negative from its definition (https://en.wikipedia.org/wiki/Coefficient_of_determination) if the model fits the data worse than a horizontal line. Basically
R^2 = 1 - SS_res/SS_tot
and SS_res
and SS_tot
are always positive. If SS_res >> SS_tot
, you have a negative R^2
. Look at this answer as well: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative
Upvotes: 5
Reputation: 1173
The article execute cross_val_score
in which DecisionTreeRegressor
is implemented. You may take a look at the documentation of scikitlearn DecisionTreeRegressor.
Basically, the score you see is R^2, or (1-u/v). U is the squared sum residual of your prediction, and v is the total square sum(sample sum of square).
u/v can be arbitrary large when you make really bad prediction, while it can only be as small as zero given u and v are sum of squared residual(>=0)
Upvotes: 0