simplfuzz
simplfuzz

Reputation: 12905

Label prediction to probability score prediction/AUC using scikit-learn SVM

For a DS competition I was using SVM to do binary classification. Here tdata,vdata both have 256 features. tlabels,vlabels both have dimensions n_samples x 1 and their unique values are 0/1.

Now, as per the competition rules, in place of labels, we need to submit probability score(between 0 to 1) and AUC will be used to decide the ranking.

I am pretty new to SVMs and Sklearn. Any pointers on how to convert this code to generate probability scores and AUC, will be of great help.

Code:

classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced') 
classifier.fit(tdata, tlabels)
expected = vlabels
predicted = classifier.predict(vdata)

print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
cm = metrics.confusion_matrix(expected, predicted)
accuracy = (cm[0,0]+cm[1,1])*100.0/sum(sum(cm))
print("accuracy = "+str(accuracy))

Output:

Classification report for classifier SVC(C=1.0, cache_size=200, class_weight='balanced', coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.00020000000000000001,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False):
             precision    recall  f1-score   support

        0.0       0.93      0.88      0.90      1881
        1.0       0.92      0.95      0.94      2686

avg / total       0.92      0.92      0.92      4567


accuracy = 92.3144296037

Upvotes: 0

Views: 3850

Answers (2)

seralouk
seralouk

Reputation: 33147

Step 1

Define: probability=True in SVC (this parameter is available for SVC) see link.

classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced', probability=True)

Step 2

Then you need to use predict_proba method.

Example:

classifier.fit(X,y)
classifier.predict_proba(X)

The result is the probabilities that you want in range [0,1].

Hope this helps.

SVC link

predict_proba

Upvotes: 1

shanmuga
shanmuga

Reputation: 4499

Use predict_proba function of SVC for obtaining probabilities instead of classes.
To use predict_proba function on SVC parameter probability=True should be given while initalization.

classifier=svm.SVC(gamma=g,C=c,kernel='rbf',class_weight='balanced', probability=True) # parameter probability=True should be given
classifier.fit(tdata, tlabels)
expected = vlabels
predicted = classifier.predict(vdata)
pred_proba = classifier.predict_proba(vdata) # predict_proba function call

fpr, tpr, thresholds = metrics.roc_curve(labels, proba_one)
metrics.auc(fpr, tpr)

Reference:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Upvotes: 1

Related Questions