Reputation: 584

Train cost is producing Nan Value in Tensorflow Code Example

I'm sure it's a simple question for someone who specializes in TensorFlow, but I couldn't solve it.

I am trying to execute the following code from Github.

https://github.com/drhuangliwei/An-Attention-based-Spatiotemporal-LSTM-Network-for-Next-POI-Recommendation

When I run AT-LSTM.py, line 240 is producing like below

if(global_steps%100==0):
            print("the %i step, train cost is: %f"%(global_steps,cost))
        global_steps+=1

Output

 the 100 step, train cost is: nan
    the 200 step, train cost is: nan
    the 300 step, train cost is: nan
    the 400 step, train cost is: nan
    the 500 step, train cost is: nan
    the 600 step, train cost is: nan
    the 700 step, train cost is: nan
    the 800 step, train cost is: nan
    the 900 step, train cost is: nan
    the 1000 step, train cost is: nan
    the 1100 step, train cost is: nan
    the 1200 step, train cost is: nan
    the 1300 step, train cost is: nan
    the 1400 step, train cost is: nan
    the 1500 step, train cost is: nan
    the 1600 step, train cost is: nan
    the 1700 step, train cost is: nan
    the 1800 step, train cost is: nan
    the 1900 step, train cost is: nan
    the 2000 step, train cost is: nan
    the 2100 step, train cost is: nan
    the 2200 step, train cost is: nan
    the 2300 step, train cost is: nan
    the 2400 step, train cost is: nan
    the 2500 step, train cost is: nan
    the 2600 step, train cost is: nan
    the 2700 step, train cost is: nan
    the 2800 step, train cost is: nan
    the 2900 step, train cost is: nan
    the 3000 step, train cost is: nan
    the 3100 step, train cost is: nan
    the 3200 step, train cost is: nan

Each iteration cost value is getting Nan value. Do you have any idea why I am getting Nan value in every iteration

Upvotes: 0

Answers (2)

TayTay

Reputation: 7170

There are a few potential reasons this could be happening. The most common answer here is either

An exploding gradient
A vanishing gradient

Exploding gradients occur when the gradient, well, "explodes" into a very large number. This can be controlling by gradient clipping. A common way to do this is to clip by norm before you apply your gradients. If you control your train_step, you can do it like this:

        with tf.GradientTape() as tape:
            logits = self(x_batch, training=True)
            loss = self.compiled_loss(y_true, logits)

        # backprop
        grads = tape.gradient(loss, self.trainable_weights)
        grads = [
            tf.clip_by_norm(g, self.gradient_clip_norm)  # tunable parameter
            for g in grads
        ]

        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

The alternative case, a vanishing gradient, can occur in networks where error signal cannot propagate through the entire network. This can happen as a result of a few things:

Your learning rate may be too high
Your network may be very deep

You can use a lower learning rate as an initial solution, but if that is still not working, you could explore residual connections in your network architecture which can help with vanishing gradients.

Upvotes: 1

Guinther Kovalski

Reputation: 1909

A common cause of this in RNN/LSTM is exploding gradients, you can avoid this with tf.clip (How to apply gradient clipping in TensorFlow?)

You can also get this by using negative labels or by a too large learning rate. Also, check weights initialization.

Upvotes: 2

Train cost is producing Nan Value in Tensorflow Code Example

Answers (2)

Related Questions