Reputation: 23
I've been following Prof. Ng 's lecture, and trying to implement SVM on my jupyter notebook using tensorflow. However, my model doesn't seem to be converged properly.
I guess I have wrong loss function and that might end up fit my model improperly.
And below is full graph construction code of my model:
tf.reset_default_graph()
#training hyper parameters
learning_rate = 0.000001
C = 20
gamma = 50
X = tf.placeholder(tf.float32, shape=(None,2))
Y = tf.placeholder(tf.float32, shape=(None,1))
landmark = tf.placeholder(tf.float32, shape=(None,2))
W = tf.Variable(np.random.random((num_data)),dtype=tf.float32)
B = tf.Variable(np.random.random((1)),dtype=tf.float32)
batch_size = tf.shape(X)[0]
#RBF Kernel
tile = tf.tile(X, (1,num_data))
diff = tf.reshape( tile, (-1, num_data, 2)) - landmark
tile_shape = tf.shape(diff)
sq_diff = tf.square(diff)
sq_dist = tf.reduce_sum(sq_diff, axis=2)
F = tf.exp(tf.negative(sq_dist * gamma))
WF = tf.reduce_sum(W * F, axis=1) + B
condition = tf.greater_equal(WF, 0)
H = tf.where(condition, tf.ones_like(WF),tf.zeros_like(WF))
ERROR_LOSS = C * tf.reduce_sum(Y * tf.maximum(0.,1-WF) + (1-Y) * tf.maximum(0.,1+WF))
WEIGHT_LOSS = tf.reduce_sum(tf.square(W))/2
TOTAL_LOSS = ERROR_LOSS + WEIGHT_LOSS
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(TOTAL_LOSS)
I'm using Gaussian Kernel and feeding whole training set as landmark.
And the loss function is the exactly same one that shown on the lecture as long as I have right implementation on it.
I'm pretty sure that I'm missing something.
Upvotes: 1
Views: 4055
Reputation: 56
Note that the kernel matrix should have batch_size^2
entries, while your tensor WF
has shape (batch_size, 2)
. The idea is to compute K(x_i, x_j) for each pair (x_i, x_j) in your dataset, and then use these kernel values as inputs to the SVM.
I'm using Andrew Ng's lecture notes on SVMs as a reference; on page 20 he derives the final optimization problem. You'll want to replace the inner-product <x_i, x_j>
with your kernel function.
I would recommend starting with a linear kernel instead of an RBF and comparing your code against an out-of-the-box SVM implementation like sklearn's. This will help you make sure your optimization code is working properly .
A final note: though it should be possible to train an SVM using gradient descent, they are almost never trained that way in practice. The SVM optimization problem can be solved via quadratic programming, and most methods for training SVMs take advantage of this.
Upvotes: 4