Rocket Pingu
Rocket Pingu

Reputation: 621

CTC LossTensor is inf or nan: Tensor had Inf values?

I keep encountering this error right on the first step of training (or after 300 hundred steps or so). Can anyone point out the reason why this is happening? If you're interested to about the model I used, here it is:

{
  "network":[
    {"layer_type": "input_layer", "name": "inputs", "shape": [-1, 168, 168, 1]},
    {"layer_type": "l2_normalize", "axis": [1, 2]},
    {"layer_type": "conv2d", "num_filters": 16, "kernel_size": [3, 3]},
    {"layer_type": "max_pool2d", "pool_size": [2, 2]},
    {"layer_type": "l2_normalize", "axis": [1, 2]},
    {"layer_type": "conv2d", "num_filters": 32, "kernel_size": [3, 3]},
    {"layer_type": "max_pool2d", "pool_size": [2, 2]},
    {"layer_type": "l2_normalize", "axis": [1, 2]},
    {"layer_type": "dropout", "keep_prob": 0.5},
    {"layer_type": "conv2d", "num_filters": 64, "kernel_size": [3, 3]},
    {"layer_type": "max_pool2d", "pool_size": [2, 2]},
    {"layer_type": "l2_normalize", "axis": [1, 2]},
    {"layer_type": "dropout", "keep_prob": 0.5},
    {"layer_type": "collapse_to_rnn_dims"},
    {"layer_type": "birnn", "num_hidden": 128, "cell_type": "LSTM"},
    {"layer_type": "birnn", "num_hidden": 128, "cell_type": "LSTM"},
    {"layer_type": "birnn", "num_hidden": 128, "cell_type": "LSTM"},
    {"layer_type": "dropout", "keep_prob": 0.5}
  ],
  "output_layer": "ctc_decoder",
  "loss": "ctc",
  "metrics": ["label_error_rate"],
  "learning_rate": 0.001,
  "optimizer": "adam"
}

As for the labels, I pad them first to match the length of the label of longest length.

Upvotes: 1

Views: 3427

Answers (2)

Rocket Pingu
Rocket Pingu

Reputation: 621

It's definitely the sequence length of the input that causes the problem. Apparently, the sequence length should be a bit greater than the ground truth length.

Upvotes: 2

End-2-End
End-2-End

Reputation: 921

I assume you're working on a Speech-to-Text or a similar problem. Inf and Nan values are often common when your model consists of RNNs / LSTMs. That's what makes them slightly difficult to implement successfully. What kind of activation function / non-linearity are you using here, especially for RNN layers?

I've often observed Nan and Inf values while training LSTMs, mostly due to vanishing-gradient and exploding-gradients problem respectively. I'd suggest using a clipped-RELU function. In TensorFlow, you can clip by value of 6, using default function tf.nn.relu6. You can write a simple custom function, to clip by some other value.

If you're still facing the same issue, try modifying the architecture and parameters a bit. Probably go with 2 layers of Bi-LSTM initially and maybe have a smaller hidden-state.

Hope this helps. Try these out and let me know how it works.

Upvotes: 1

Related Questions