Reputation: 584
I'm sure it's a simple question for someone who specializes in TensorFlow, but I couldn't solve it.
I am trying to execute the following code from Github.
When I run AT-LSTM.py, line 240 is producing like below
if(global_steps%100==0):
print("the %i step, train cost is: %f"%(global_steps,cost))
global_steps+=1
Output
the 100 step, train cost is: nan
the 200 step, train cost is: nan
the 300 step, train cost is: nan
the 400 step, train cost is: nan
the 500 step, train cost is: nan
the 600 step, train cost is: nan
the 700 step, train cost is: nan
the 800 step, train cost is: nan
the 900 step, train cost is: nan
the 1000 step, train cost is: nan
the 1100 step, train cost is: nan
the 1200 step, train cost is: nan
the 1300 step, train cost is: nan
the 1400 step, train cost is: nan
the 1500 step, train cost is: nan
the 1600 step, train cost is: nan
the 1700 step, train cost is: nan
the 1800 step, train cost is: nan
the 1900 step, train cost is: nan
the 2000 step, train cost is: nan
the 2100 step, train cost is: nan
the 2200 step, train cost is: nan
the 2300 step, train cost is: nan
the 2400 step, train cost is: nan
the 2500 step, train cost is: nan
the 2600 step, train cost is: nan
the 2700 step, train cost is: nan
the 2800 step, train cost is: nan
the 2900 step, train cost is: nan
the 3000 step, train cost is: nan
the 3100 step, train cost is: nan
the 3200 step, train cost is: nan
Each iteration cost value is getting Nan value. Do you have any idea why I am getting Nan value in every iteration
Upvotes: 0
Views: 219
Reputation: 7170
There are a few potential reasons this could be happening. The most common answer here is either
Exploding gradients occur when the gradient, well, "explodes" into a very large number. This can be controlling by gradient clipping. A common way to do this is to clip by norm before you apply your gradients. If you control your train_step
, you can do it like this:
with tf.GradientTape() as tape:
logits = self(x_batch, training=True)
loss = self.compiled_loss(y_true, logits)
# backprop
grads = tape.gradient(loss, self.trainable_weights)
grads = [
tf.clip_by_norm(g, self.gradient_clip_norm) # tunable parameter
for g in grads
]
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
The alternative case, a vanishing gradient, can occur in networks where error signal cannot propagate through the entire network. This can happen as a result of a few things:
You can use a lower learning rate as an initial solution, but if that is still not working, you could explore residual connections in your network architecture which can help with vanishing gradients.
Upvotes: 1
Reputation: 1909
A common cause of this in RNN/LSTM is exploding gradients, you can avoid this with tf.clip (How to apply gradient clipping in TensorFlow?)
You can also get this by using negative labels or by a too large learning rate. Also, check weights initialization.
Upvotes: 2