Reputation: 129
I need to plot feature_importances for DecisionTreeClassifier. Features are already found and target results are achieved, but my teacher tells me to plot feature_importances to see weights of contributing factors. I have no idea how to do it.
model = DecisionTreeClassifier(random_state=12345, max_depth=8,class_weight='balanced')
model.fit(features_train,target_train)
model.feature_importances_
It gives me.
array([0.02927077, 0.3551379 , 0.01647181, ..., 0.03705096, 0. ,
0.01626676])
Why it is not attached to anything like max_depth and just an array of some numbers?
Upvotes: 2
Views: 7213
Reputation: 81
Feature importances represent the affect of the factor to the outcome variable. The greater it is, the more it affects the outcome. That's why you received the array. For plotting, you can do:
import matplotlib.pyplot as plt
feat_importances = pd.DataFrame(model.feature_importances_, index=features_train.columns, columns=["Importance"])
feat_importances.sort_values(by='Importance', ascending=False, inplace=True)
feat_importances.plot(kind='bar', figsize=(8,6))
Upvotes: 2
Reputation: 3005
Feature importance refers to a class of techniques for assigning scores to input features to a predictive model that indicates the relative importance of each feature when making a prediction.
Feature importance scores can be calculated for problems that involve predicting a numerical value, called regression, and those problems that involve predicting a class label, called classification.
Load the feature importances into a pandas series
indexed by your dataframe column names, then use its plot method.
From Scikit Learn
Feature importances are provided by the fitted attribute feature_importances_ and they are computed as the mean and standard deviation of accumulation of the impurity decrease within each tree.
How are feature_importances in RandomForestClassifier determined?
For your example:
feat_importances = pd.Series(model.feature_importances_, index=df.columns)
feat_importances.nlargest(5).plot(kind='barh')
More ways to plot Feature Importances- Random Forest Feature Importance Chart using Python
Upvotes: 0