Reputation: 1
I fitted a decision tree to a dataset having 20 inputs and 1 categorical output using the following Python Code (wordsDatum is just an array containing inputs in columns 0 to 19 and the output in column 20
clsfr=tree.DecisionTreeClassifier(max_depth=2,min_samples_leaf=50)
clsfr=clsfr.fit(wordsDatum[:,0:19],wordsDatum[:,20])
for items in clsfr.feature_importances_:
print items
When I print the feature importances, I only get 19 values - this is strange considering I have 20 features. Any ideas what might be going on here?
Thanks for your help!
Upvotes: 0
Views: 828
Reputation: 1
Thanks for your response! Yes, python seems to have this quirk (?) of including the lower limit but excluding the upper limit of the range
Upvotes: 0
Reputation: 4072
This is due to how lists are defined in python. You can find some good insights on this here.
But in summary, if you define a list like this:
my_list = [0, 1, 2, 3, 4, 5]
and you call my_list[0:5]
, it will give you:
[0, 1, 2, 3, 4]
So if you change the second line of your code to:
clsfr=clsfr.fit(wordsDatum[:,0:20],wordsDatum[:,20])
It will do what you expect of it. It will include the first twenty features.
Upvotes: 1