Reputation: 12905
For a DS competition I was using SVM to do binary classification. Here tdata,vdata both have 256 features. tlabels,vlabels both have dimensions n_samples x 1 and their unique values are 0/1.
Now, as per the competition rules, in place of labels, we need to submit probability score(between 0 to 1) and AUC will be used to decide the ranking.
I am pretty new to SVMs and Sklearn. Any pointers on how to convert this code to generate probability scores and AUC, will be of great help.
Code:
classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced')
classifier.fit(tdata, tlabels)
expected = vlabels
predicted = classifier.predict(vdata)
print("Classification report for classifier %s:\n%s\n"
% (classifier, metrics.classification_report(expected, predicted)))
cm = metrics.confusion_matrix(expected, predicted)
accuracy = (cm[0,0]+cm[1,1])*100.0/sum(sum(cm))
print("accuracy = "+str(accuracy))
Output:
Classification report for classifier SVC(C=1.0, cache_size=200, class_weight='balanced', coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.00020000000000000001,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False):
precision recall f1-score support
0.0 0.93 0.88 0.90 1881
1.0 0.92 0.95 0.94 2686
avg / total 0.92 0.92 0.92 4567
accuracy = 92.3144296037
Upvotes: 0
Views: 3850
Reputation: 33147
Step 1
Define: probability=True in SVC
(this parameter is available for SVC) see link.
classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced', probability=True)
Step 2
Then you need to use predict_proba
method.
Example:
classifier.fit(X,y)
classifier.predict_proba(X)
The result is the probabilities that you want in range [0,1].
Hope this helps.
Upvotes: 1
Reputation: 4499
Use predict_proba
function of SVC
for obtaining probabilities instead of classes.
To use predict_proba
function on SVC
parameter probability=True
should be given while initalization.
classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced', probability=True) # parameter probability=True should be given
classifier.fit(tdata, tlabels)
expected = vlabels
predicted = classifier.predict(vdata)
pred_proba = classifier.predict_proba(vdata) # predict_proba function call
fpr, tpr, thresholds = metrics.roc_curve(labels, proba_one)
metrics.auc(fpr, tpr)
Reference:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
Upvotes: 1