Arne
Arne

Reputation: 20117

Identifying a sklearn-model's classes

The documentation on SVMs implies that an attribute called classes_ exists, which allegedly reveals how the model represents classes internally.

I would like to get that information in order to interpret the output from functions like predict_proba, which generates probabilities of classes for a number of samples. Hopefully, knowing that given some illustrating values:

model.classes_ 
>>> [1, 2, 4]

means that I can assume that this holds:

model.predict_proba([[1.2312, 0.23512, 6.01234], [3.7655, 8.2353, 0.86323]]) 
>>> [[0.032, 0.143, 0.825], [0.325, 0.143, 0.532]]

Probabilities should translate to the same order as the classes, i.e. for the first set of features I can assume:

probability of class 1: 0.032
probability of class 2: 0.143
probability of class 4: 0.825

But calling classes_ on an SVM results in an error. Is there a good way to get that information? I can't imagine that it's not accessible any more after the model is trained.


edit: The way I build my model is more or less like this:

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion


pipeline = Pipeline([
   ('features', FeatureUnion(transformer_list[ ... ])),
   ('svm', SVC(probability=True))
])
parameters = { ... }
grid_search = GridSearchCV(
    pipeline,
    parameters
)

grid_search.fit(get_data(), get_labels())
clf = [elem for elem in grid_search.estimator.steps if elem[0] == 'svm'][0][1]

print(clf)
>> SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
print(clf.classes_)
>> Traceback (most recent call last):
  File "path/to/script.py", line 284, in <module>
  File "path/to/script.py", line 181, in re_train
    print(clf.classes_)
AttributeError: 'SVC' object has no attribute 'classes_'

Upvotes: 4

Views: 10035

Answers (3)

Tiddu
Tiddu

Reputation: 1

I believe this should do the trick

arr = model.predict_proba(X)

list1 = arr.tolist()

cls = model.classes_

list2 = cls.tolist()

d = {''Category'':list2,''Probability'':list1[0]}

df = pd.DataFrame(d)

print(df)

Upvotes: 0

Andreas Mueller
Andreas Mueller

Reputation: 28748

The grid_search.estimator that you are looking at is the unfitted pipeline. The classes_ attribute only exists after fitting, as the classifier needs to have seen y.

What you want it the estimator that was trained using the best parameter settings, which is grid_search.best_estimator_.

The following will work:

clf = grid_search.best_estimator_.named_steps['svm']
print(clf.classes_)

[and classes_ does exactly what you think it does].

Upvotes: 3

chappers
chappers

Reputation: 2415

There is a classes field in sklearn, it probably means you were calling the wrong model, see example below, we can see that there are classes when looking at the classes_ field:

>>> import numpy as np
>>> from sklearn.svm import SVC
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> clf = SVC(probability=True)
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
>>> print clf.classes_
[1 2]
>>> print clf.predict([[-0.8, -1]])
[1]
>>> print clf.predict_proba([[-0.8, -1]])
[[ 0.92419129  0.07580871]]

Upvotes: 1

Related Questions