Ammastaro
Ammastaro

Reputation: 193

How to manually change feature values of decision trees in sklearn?

In scikit-learn, if I have a decision tree pulled from

RandomForestClassifer().estimators_

Is there a way I can manually change some of the features? I can iterate through them using

for estimator in rfc.estimators_:
    for feature in estimator.tree_.feature:

but I would like to manually change the feature in this case. How would I go about this?

Upvotes: 0

Views: 3422

Answers (1)

Gambit1614
Gambit1614

Reputation: 8801

If I have understood your question correctly then you want to changes the parameters of the decision trees inside the random forest ? I am not exactly sure why you would want to do that.

I will break the solution into two parts

First we will try to change the parameters of a decision tree

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

clf = DecisionTreeClassifier(random_state=0)

iris = load_iris()

clf.fit(iris.data,iris.target)
#DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
#       max_features=None, max_leaf_nodes=None,
#        min_impurity_decrease=0.0, min_impurity_split=None,
#        min_samples_leaf=1, min_samples_split=2,
#        min_weight_fraction_leaf=0.0, presort=False, random_state=0,
#        splitter='best')

#Now extract the parameters
parameters_dt = clf.get_params()

#Now change the parameter you want
parameters_dt['max_depth'] = 3

#Now create a new classifier
new_clf = DecisionTreeClassifier(**parameters_dt)
#DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
#        max_features=None, max_leaf_nodes=None,
#        min_impurity_decrease=0.0, min_impurity_split=None,
#        min_samples_leaf=1, min_samples_split=2,
#        min_weight_fraction_leaf=0.0, presort=False, random_state=0,
#        splitter='best')

Now lets get back to Random Forest

from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4,
                        n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

clf = RandomForestClassifier(max_depth=2, random_state=0)

clf.fit(X, y)

clf_list = clf.estimators_
for idx in range(0,len(clf_list)):
    #Get the current Decision Tree in Random Forest
    estimator = clf_list[idx]

    #Get the params
    temp_params = estimator.get_params()

    #Change the params you want
    temp_params['max_depth'] = 3

    #Create a new decision tree
    temp_decision_tree = DecisionTreeClassifier(**temp_params)

    #Remove the old decision tree
    clf.estimators_.pop(idx)

    #Then insert the new decision tree at the current position
    clf.estimators_.insert(idx, temp_decision_tree)

Note: This might not have the effect that you want. I mean it might not result in the exact classifier that you want.

Upvotes: 1

Related Questions