Reputation: 1
I am currently using decision trees (using Scikit Learn DecisionTreeRegressor) to fit Regression tree. The problem I'm facing is that using the algorithm with same data as 6 months ago there is a slight change in output (ie. the optimal split point). My guess is that it could be that they have changed slightly the way they compute the mse criterion or something like that. Anybody knows?
Upvotes: 0
Views: 313
Reputation: 3457
DecisionTreeRegressor
exhibits random behavior unless you specify a random_state
as an argument of the constructor.
The details of random_state
from the documentation
explains the spots where randomness might affect your execution - see specially the bold part I highlighted:
random_state int, RandomState instance or None, default=None
Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When
max_features < n_features
, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even ifmax_features=n_features
. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting,random_state
has to be fixed to an integer. See Glossary for details.
Upvotes: 2