Sklearn SVM gives wrong decision boundary

Question

I am using the SVC from Sklearn in the code and plotted it using mlxtend plot_decision_regions functions. Please take a look at the code below, my data is simple 2D points. The plot of the decision function does not make sense as the boundary is closer to one class than the other. Am I doing/interpreting any wrong?

import numpy
from sklearn.svm import SVC
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions


f = np.array([[1, 1], [0, 0]])

labels = np.array([1, 0])

model = SVC(kernel='linear')
model.fit(f, labels)

plot_decision_regions(X=f, y=labels, clf=model, legend=2)
plt.ylim([-1, 2])
plt.xlim([-1, 2])
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.show()

The result for this data looks boxy as shown below: with data from case 1

If I change the data is f to: np.array([[1, 1], [0, 0], [1, 0], [0, 1]]) and the labels to: np.array([1, 0, 0, 1])

The result looks like: With data from case 2

Is it because of the plotting library I am using?

BugKiller · Accepted Answer

It seems there is some bug in plot_decision_regions.

Let's use plot_svc_decision_boundary of handson-ml

import numpy as np
from sklearn.svm import SVC
import matplotlib.pyplot as plt

def plot_svc_decision_boundary(svm_clf, xmin, xmax):
    w = svm_clf.coef_[0]
    b = svm_clf.intercept_[0]

    # At the decision boundary, w0*x0 + w1*x1 + b = 0
    # => x1 = -w0/w1 * x0 - b/w1
    x0 = np.linspace(xmin, xmax, 200)
    decision_boundary = -w[0]/w[1] * x0 - b/w[1]

    margin = 1/w[1]
    gutter_up = decision_boundary + margin
    gutter_down = decision_boundary - margin

    svs = svm_clf.support_vectors_
    plt.scatter(svs[:, 0], svs[:, 1], s=180, facecolors='#FFAAAA')
    plt.plot(x0, decision_boundary, "k-", linewidth=2)
    plt.plot(x0, gutter_up, "k--", linewidth=2)
    plt.plot(x0, gutter_down, "k--", linewidth=2)


f = np.array([[1, 1], [0, 0]])

labels = np.array([1, 0])

svm_clf = SVC(kernel='linear')
svm_clf.fit(f, labels)

plot_svc_decision_boundary(svm_clf, -1, 2.0)
plt.ylim([-1, 2])
plt.xlim([-1, 2])
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.scatter(f[0, 0], f[0, 1], marker='^', s=80)
plt.scatter(f[1, 0], f[1, 1], marker='s', s=80)
plt.show()

Sklearn SVM gives wrong decision boundary

Answers (1)

Related Questions