cantdutchthis
cantdutchthis

Reputation: 34527

tf.where causes optimiser to fail in tensorflow

I want to check if I can solve this problem with tensorflow instead of pymc3. The experimental idea is that I am going to define a probibalistic system that contains a switchpoint. I can use sampling as a method of inference but I started wondering why I couldn't just do this with a gradient descent instead.

I decided to do the gradient search in tensorflow but it seems like tensorflow is having a hard time performing a gradient search when tf.where is involved.

You can find the code below.

import tensorflow as tf
import numpy as np

x1 = np.random.randn(50)+1
x2 = np.random.randn(50)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(5, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(10, name = "tau", dtype=tf.float32)

mu = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu2)
sigma = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma2)

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) -tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")

optimizer = tf.train.RMSPropOptimizer(0.01)
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
    for step in range(10000):
        _lik, _ = sess.run([total_likelihood, opt_task])
        if step % 1000 == 0:
            variables = {_.name:_.eval() for _ in [mu1, mu2, sigma1, sigma2, tau]}
            print("step: {}, values: {}".format(str(step).zfill(4), variables))

You'll notice that the tau parameter does not change even though tensorflow seems to be aware of the variable and it's gradient. Any clue on what is going wrong? Is this something that can be calculated in tensorflow or do I need a different pattern?

Upvotes: 2

Views: 1256

Answers (2)

cantdutchthis
cantdutchthis

Reputation: 34527

The assigned solution is the correct answer, but doesn't contain the code solution to my problem. The following snippet does;

import tensorflow as tf
import numpy as np
import os
import uuid

TENSORBOARD_PATH = "/tmp/tensorboard-switchpoint"
# tensorboard --logdir=/tmp/tensorboard-switchpoint

x1 = np.random.randn(35)-1
x2 = np.random.randn(35)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(0, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(15, name = "tau", dtype=tf.float32)
switch = 1./(1+tf.exp(tf.pow(time_all - tau, 1)))

mu = switch*mu1 + (1-switch)*mu2
sigma = switch*sigma1 + (1-switch)*sigma2

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) - tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")

optimizer = tf.train.AdamOptimizer()
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()

tf.summary.scalar("mu1", mu1)
tf.summary.scalar("mu2", mu2)
tf.summary.scalar("sigma1", sigma1)
tf.summary.scalar("sigma2", sigma2)
tf.summary.scalar("tau", tau)
tf.summary.scalar("likelihood", total_likelihood)
merged_summary_op = tf.summary.merge_all()

with tf.Session() as sess:
    sess.run(init)
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
    uniq_id = os.path.join(TENSORBOARD_PATH, "switchpoint-" + uuid.uuid1().__str__()[:4])
    summary_writer = tf.summary.FileWriter(uniq_id, graph=tf.get_default_graph())
    for step in range(40000):
        lik, opt, summary = sess.run([total_likelihood, opt_task, merged_summary_op])
        if step % 100 == 0:
            variables = {_.name:_.eval() for _ in [total_likelihood]}
            summary_writer.add_summary(summary, step)
            print("i{}: {}".format(str(step).zfill(5), variables))

Upvotes: 2

interjay
interjay

Reputation: 110108

tau is only used in the condition argument to where: (tf.where(time_all < tau, ...) , which is a boolean tensor. Since calculating gradients only makes sense for continuous values, the gradient of the output with respect to tau will be zero.

Even ignoring tf.where, you used tau in the expression time_all < tau, which is constant almost everywhere, so has a gradient of zero.

Due to the gradient of zero, there is no way to learn tau with gradient descent methods.

Depending on your problem, maybe instead of a hard switch between two values, you can use a weighted sum instead p*val1 + (1-p)*val2, where p depends on tau in a continuous manner.

Upvotes: 4

Related Questions