TensorFlow GradientDescentOptimizer not convering on expected cost

Question

I'm reviewing the material I did in Andrew Ng's class on ML and trying to implement it in TensorFlow. I was able to use scipy's optimize function to get a cost of 0.213, but with TensorFlow, it's stuck at 0.622, not very far from the initial loss of 0.693 using an initial set of weights of zero.

I reviewed the post here and added a tf.maximum call to my loss function to prevent NaN's. I'm not convinced this is the right approach and I'm sure there is a better way. I also tried using tf.clip_by_value instead but that gives the same non-optimized cost.

iterations = 1500

with tf.Session() as sess:
    X = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)
    theta = tf.Variable(tf.zeros([3,1]), dtype=tf.float32)
    training_rows = tf.placeholder(tf.float32)
    z = tf.matmul(X, theta)
    h_x = 1.0 / (1.0 + tf.exp(-z)) 
    lhs = tf.matmul(tf.transpose(-y), tf.log(tf.maximum(1e-5, h_x)))
    rhs = tf.matmul(tf.transpose((1 - y)), tf.log(tf.maximum(1e-5, 1 - h_x)))
    loss = tf.reduce_sum(lhs - rhs) / training_rows
    alpha = 0.001
    optimizer = tf.train.GradientDescentOptimizer(alpha)
    train = optimizer.minimize(loss)

    # Run the session
    X_val, y_val = get_data()
    rows = X_val.shape[0]
    kwargs = {X: X_val, y: y_val, training_rows: rows}
    sess.run(tf.global_variables_initializer())
    sess.run(tf.assign(theta, np.array([0,0,0]).reshape(3,1)))
    print("Original cost before optimization is: {}".format(sess.run(loss, kwargs)))
    print("Optimizing loss function")
    costs = []
    for i in range(iterations):
        optimal_theta, result = sess.run([theta, train], {X: X_val, y: y_val, training_rows: rows})
        cost = sess.run(loss, kwargs)
        costs.append(cost)
    optimal_theta,loss = sess.run([theta, loss], {X: X_val, y: y_val, training_rows: rows})
    print("Optimal value for theta is: {} with a loss of: {}".format(optimal_theta, loss))
    plt.plot(costs)
    plt.show()

I also noticed that any learning rate greater than 0.001 would cause the optimizer to dance wildly back and forth with the loss. Is that normal? Finally, when I tried increasing the iterations to 25,000 I realized the cost when down to 0.53. I was expecting that it would converge in much fewer iterations.

TensorFlow GradientDescentOptimizer not convering on expected cost

Answers (1)

Related Questions