Reputation:
import sklearn
import sklearn.datasets
import sklearn.ensemble
import numpy as np
from treeinterpreter import treeinterpreter as ti
iris = sklearn.datasets.load_iris()
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500, random_state = 50 )
rf.fit(iris.data, iris.target)
instances =iris.data[100].reshape(1,-1)
prediction, biases, contributions = ti.predict(rf, instances)
for i in range(len(instances)):
for c, feature in sorted(zip(contributions[i],
iris.feature_names),
key=lambda x: ~abs(x[0].any())):
print (feature, c)
I am trying to print the maximum value name in this list but I get True
instead. Any ideas why and how to mitigate this?
You can copy/paste the code to run it in your environment
I slightly modified the question to print the name of column of the maximum value, rather than the maximum value
The output is
Feature contributions:
--------------------
sepal length (cm) [-0.046 -0.01 0.057]
sepal width (cm) [-0. -0. 0.]
petal length (cm) [-0.136 -0.153 0.289]
petal width (cm) [-0.148 -0.171 0.319]
The output I am hoping for
petal width (cm)
Upvotes: 0
Views: 76
Reputation: 698
You should be using c.max() instead c.all() if you want to get the max element of the array. This section of code should give you what you want:
maxFeatures = []
for i in range(len(instances)):
maxList= 0
maxFeature = ''
for c, feature in sorted(zip(contributions[i],
iris.feature_names),
key=lambda x: ~abs(x[0].any())):
if c.max()>maxList:
maxList=c.max()
maxFeature=feature
print (feature, c)
maxFeatures.append(maxFeature)
print( maxFeatures )
Upvotes: 1