Nikita Tsekhanovich
Nikita Tsekhanovich

Reputation: 129

How to plot feature_importance for DecisionTreeClassifier?

I need to plot feature_importances for DecisionTreeClassifier. Features are already found and target results are achieved, but my teacher tells me to plot feature_importances to see weights of contributing factors. I have no idea how to do it.

model = DecisionTreeClassifier(random_state=12345, max_depth=8,class_weight='balanced') 
model.fit(features_train,target_train)
model.feature_importances_

It gives me.

array([0.02927077, 0.3551379 , 0.01647181, ..., 0.03705096, 0.        ,
       0.01626676])

Why it is not attached to anything like max_depth and just an array of some numbers?

Upvotes: 2

Views: 7213

Answers (2)

Hoài Lâm
Hoài Lâm

Reputation: 81

Feature importances represent the affect of the factor to the outcome variable. The greater it is, the more it affects the outcome. That's why you received the array. For plotting, you can do:

import matplotlib.pyplot as plt

feat_importances = pd.DataFrame(model.feature_importances_, index=features_train.columns, columns=["Importance"])
feat_importances.sort_values(by='Importance', ascending=False, inplace=True)
feat_importances.plot(kind='bar', figsize=(8,6))

Upvotes: 2

Ailurophile
Ailurophile

Reputation: 3005

Feature importance refers to a class of techniques for assigning scores to input features to a predictive model that indicates the relative importance of each feature when making a prediction.

Feature importance scores can be calculated for problems that involve predicting a numerical value, called regression, and those problems that involve predicting a class label, called classification.

Load the feature importances into a pandas series indexed by your dataframe column names, then use its plot method.

From Scikit Learn

Feature importances are provided by the fitted attribute feature_importances_ and they are computed as the mean and standard deviation of accumulation of the impurity decrease within each tree.

How are feature_importances in RandomForestClassifier determined?

For your example:

feat_importances = pd.Series(model.feature_importances_, index=df.columns)
feat_importances.nlargest(5).plot(kind='barh')

More ways to plot Feature Importances- Random Forest Feature Importance Chart using Python

Upvotes: 0

Related Questions