Reputation: 20117
The documentation on SVMs implies that an attribute called classes_
exists, which allegedly reveals how the model represents classes internally.
I would like to get that information in order to interpret the output from functions like predict_proba
, which generates probabilities of classes for a number of samples. Hopefully, knowing that given some illustrating values:
model.classes_
>>> [1, 2, 4]
means that I can assume that this holds:
model.predict_proba([[1.2312, 0.23512, 6.01234], [3.7655, 8.2353, 0.86323]])
>>> [[0.032, 0.143, 0.825], [0.325, 0.143, 0.532]]
Probabilities should translate to the same order as the classes, i.e. for the first set of features I can assume:
probability of class 1: 0.032
probability of class 2: 0.143
probability of class 4: 0.825
But calling classes_
on an SVM results in an error. Is there a good way to get that information? I can't imagine that it's not accessible any more after the model is trained.
edit: The way I build my model is more or less like this:
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion
pipeline = Pipeline([
('features', FeatureUnion(transformer_list[ ... ])),
('svm', SVC(probability=True))
])
parameters = { ... }
grid_search = GridSearchCV(
pipeline,
parameters
)
grid_search.fit(get_data(), get_labels())
clf = [elem for elem in grid_search.estimator.steps if elem[0] == 'svm'][0][1]
print(clf)
>> SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=True, random_state=None,
shrinking=True, tol=0.001, verbose=False)
print(clf.classes_)
>> Traceback (most recent call last):
File "path/to/script.py", line 284, in <module>
File "path/to/script.py", line 181, in re_train
print(clf.classes_)
AttributeError: 'SVC' object has no attribute 'classes_'
Upvotes: 4
Views: 10035
Reputation: 1
I believe this should do the trick
arr = model.predict_proba(X)
list1 = arr.tolist()
cls = model.classes_
list2 = cls.tolist()
d = {''Category'':list2,''Probability'':list1[0]}
df = pd.DataFrame(d)
print(df)
Upvotes: 0
Reputation: 28748
The grid_search.estimator
that you are looking at is the unfitted pipeline.
The classes_
attribute only exists after fitting, as the classifier needs to have seen y
.
What you want it the estimator that was trained using the best parameter settings, which is grid_search.best_estimator_
.
The following will work:
clf = grid_search.best_estimator_.named_steps['svm']
print(clf.classes_)
[and classes_ does exactly what you think it does].
Upvotes: 3
Reputation: 2415
There is a classes field in sklearn
, it probably means you were calling the wrong model, see example below, we can see that there are classes when looking at the classes_
field:
>>> import numpy as np
>>> from sklearn.svm import SVC
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> clf = SVC(probability=True)
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=True, random_state=None,
shrinking=True, tol=0.001, verbose=False)
>>> print clf.classes_
[1 2]
>>> print clf.predict([[-0.8, -1]])
[1]
>>> print clf.predict_proba([[-0.8, -1]])
[[ 0.92419129 0.07580871]]
Upvotes: 1