SVM Tensorflow implementation

Question

I've been following Prof. Ng 's lecture, and trying to implement SVM on my jupyter notebook using tensorflow. However, my model doesn't seem to be converged properly.

Scattered plot after 5000 steps of training

I guess I have wrong loss function and that might end up fit my model improperly.

And below is full graph construction code of my model:

tf.reset_default_graph()

#training hyper parameters

learning_rate = 0.000001
C = 20
gamma = 50

X = tf.placeholder(tf.float32, shape=(None,2))
Y = tf.placeholder(tf.float32, shape=(None,1))
landmark = tf.placeholder(tf.float32, shape=(None,2))

W = tf.Variable(np.random.random((num_data)),dtype=tf.float32)
B = tf.Variable(np.random.random((1)),dtype=tf.float32)

batch_size = tf.shape(X)[0]

#RBF Kernel
tile = tf.tile(X, (1,num_data))
diff = tf.reshape( tile, (-1, num_data, 2)) - landmark
tile_shape = tf.shape(diff)
sq_diff = tf.square(diff)
sq_dist = tf.reduce_sum(sq_diff, axis=2)
F = tf.exp(tf.negative(sq_dist * gamma))

WF = tf.reduce_sum(W * F, axis=1) + B

condition = tf.greater_equal(WF, 0)
H = tf.where(condition,  tf.ones_like(WF),tf.zeros_like(WF))

ERROR_LOSS = C * tf.reduce_sum(Y * tf.maximum(0.,1-WF) + (1-Y) * tf.maximum(0.,1+WF))
WEIGHT_LOSS = tf.reduce_sum(tf.square(W))/2

TOTAL_LOSS = ERROR_LOSS + WEIGHT_LOSS

optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(TOTAL_LOSS)

I'm using Gaussian Kernel and feeding whole training set as landmark.

And the loss function is the exactly same one that shown on the lecture as long as I have right implementation on it.

Loss function on the lecture

I'm pretty sure that I'm missing something.

Tucker Leavitt · Accepted Answer

Note that the kernel matrix should have batch_size^2 entries, while your tensor WF has shape (batch_size, 2). The idea is to compute K(x_i, x_j) for each pair (x_i, x_j) in your dataset, and then use these kernel values as inputs to the SVM.

I'm using Andrew Ng's lecture notes on SVMs as a reference; on page 20 he derives the final optimization problem. You'll want to replace the inner-product with your kernel function.

I would recommend starting with a linear kernel instead of an RBF and comparing your code against an out-of-the-box SVM implementation like sklearn's. This will help you make sure your optimization code is working properly .

A final note: though it should be possible to train an SVM using gradient descent, they are almost never trained that way in practice. The SVM optimization problem can be solved via quadratic programming, and most methods for training SVMs take advantage of this.

SVM Tensorflow implementation

Answers (1)

Related Questions