vasilis polimenis
vasilis polimenis

Reputation: 1

Scikit Learn DecisionTreeRegressor algorithm not consistent

I am currently using decision trees (using Scikit Learn DecisionTreeRegressor) to fit Regression tree. The problem I'm facing is that using the algorithm with same data as 6 months ago there is a slight change in output (ie. the optimal split point). My guess is that it could be that they have changed slightly the way they compute the mse criterion or something like that. Anybody knows?

Upvotes: 0

Views: 313

Answers (1)

dataista
dataista

Reputation: 3457

DecisionTreeRegressor exhibits random behavior unless you specify a random_state as an argument of the constructor.

The details of random_state from the documentation explains the spots where randomness might affect your execution - see specially the bold part I highlighted:

random_state int, RandomState instance or None, default=None

Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

Upvotes: 2

Related Questions