DataAlgo
DataAlgo

Reputation: 1

Number of feature_importances_ does not match no of features in Scikit learn's DecisionTreeClassifier

I fitted a decision tree to a dataset having 20 inputs and 1 categorical output using the following Python Code (wordsDatum is just an array containing inputs in columns 0 to 19 and the output in column 20

clsfr=tree.DecisionTreeClassifier(max_depth=2,min_samples_leaf=50)
clsfr=clsfr.fit(wordsDatum[:,0:19],wordsDatum[:,20])
for items in clsfr.feature_importances_:
    print items

When I print the feature importances, I only get 19 values - this is strange considering I have 20 features. Any ideas what might be going on here?

Thanks for your help!

Upvotes: 0

Views: 828

Answers (2)

pythonCodeHelp
pythonCodeHelp

Reputation: 1

Thanks for your response! Yes, python seems to have this quirk (?) of including the lower limit but excluding the upper limit of the range

Upvotes: 0

oxtay
oxtay

Reputation: 4072

This is due to how lists are defined in python. You can find some good insights on this here.

But in summary, if you define a list like this:

my_list = [0, 1, 2, 3, 4, 5]

and you call my_list[0:5], it will give you:

[0, 1, 2, 3, 4]

So if you change the second line of your code to:

clsfr=clsfr.fit(wordsDatum[:,0:20],wordsDatum[:,20])

It will do what you expect of it. It will include the first twenty features.

Upvotes: 1

Related Questions