DsCpp
DsCpp

Reputation: 2489

Tensorflow - trainable variable does not change over time

I'm trying to apply two different masking methods to an input tensor, one is a half normal distribution filter and the other is a simple step function.

While the half Gauss filter works fine, when trying to apply a step function filter, the variable (i.e that defines the point where the step occurs) doesn't seems to learn at all.

This is the filters code:

def per_kernel_step_filter(input,weight_param=20,trainable=True):
    input_shape = input.get_shape().as_list()

    weight_param_v = tf.Variable(np.full((input_shape[-1]),weight_param), dtype=tf.float32, trainable=trainable)
    weight_param_v_c = tf.clip_by_value(weight_param_v, 0, input_shape[-2])
    kernel_filter = tf.transpose(tf.sequence_mask(weight_param_v_c, input_shape[-2], dtype=tf.float32))
    kernel_filter = tf.reshape(kernel_filter,tf.concat([(1,1),kernel_filter.get_shape()],0))

    output = input * kernel_filter
    tf.summary.histogram("weight_param histogram", weight_param_v)

    return output

And from tensorboard it seems like it doesn't even attached to the Adam optimizer at the end.enter image description here

and weight_param_v is flat on weight_param.

Is it possible that because other operations, e.g sequence_mask the variable becomes non-trainable?

Upvotes: 3

Views: 731

Answers (1)

javidcf
javidcf

Reputation: 59731

The problem in this case is that tf.sequence_mask is not differentiable, that is, there is no analytical function that tells you how much the output (or the loss) changes if you apply a small change to weight_param_v. A possible workaround is to use instead some sigmoid or smoothstep function instead. For example, you could use the logistic function (tf.math.sigmoid), shifted so it is centered around the step point, and you can manipulate the points where it is evaluated to control how "steep" it is (note this will affect the gradients and in turn the ability of the variable to learn).

In general, you can use tf.gradients to check if something is differentiable or not. For example, if you have a function my_function, you can take an input x and define y = my_function(x), then check the output of tf.gradients(y, x); if it is [None], then the function is not differentiable.

import tensorflow as tf

x = tf.placeholder(tf.float32, [None])

# Squaring is differentiable
print(tf.gradients(tf.square(x), x))
# [<tf.Tensor 'gradients/Square_grad/Mul_1:0' shape=(?,) dtype=float32>]

# Flooring is not differentiable
print(tf.gradients(tf.floor(x), x))
# [None]

# Sequence mask is not differentiable
print(tf.gradients(tf.sequence_mask(x, dtype=tf.float32), x))
# [None]

# Gather is differentiable for the parameters but not for the indices
x2 = tf.placeholder(tf.int32, [None])
print(tf.gradients(tf.gather(x, x2), [x, x2]))
# [<tensorflow.python.framework.ops.IndexedSlices object at 0x000001F6EDD09160>, None]

A tricky thing, which I think is what was happening to you in this case, is that the training may work even if there are some None gradients. As long as there is some valid gradient, TensorFlow (or, more specifically, tf.train.Optimizer and its subclasses) assumes that None gradients are irrelevant. One possible check you could do is, instead of calling minimize directly, call compute_gradients and check there are no None gradients before calling apply_gradients.

Upvotes: 1

Related Questions