Reputation: 336
I'm trying to build a sequence to sequence model in Tensorflow , I have followed several tutorials and all is good. Untill I reached a point where I decided to remove the teacher forcing in my model . below is a sample of decoder network that I'm using :
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
sequence_length=target_sequence_length,
time_major=False)
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
As per my understanding the TrainingHelper is doing the teacher forcing. Especially that is it taking the true output as part of its arguments. I tried to use the decoder without training help but it appears to be mandatory. I tried to set the true output to 0 but apparently the output is needed by the TrainingHelper . I have also tried to google a solution but I did not find anything related .
===================Update=============
I apologize for not mentioning this earlier but I tried using GreedyEmbeddingHelper as well .The model runs fine a couple of iterations and then starts throwing a run time error . it appears that the GreedyEmbeddingHelper starts predicting output different that the expectected shape . Below is my function when using the GreedyEmbeddingHelper
def decoding_layer_train(encoder_state, dec_cell, dec_embeddings,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
start_tokens = tf.tile(tf.constant([target_vocab_to_int['<GO>']], dtype=tf.int32), [batch_size], name='start_tokens')
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
start_tokens,
target_vocab_to_int['<EOS>'])
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
this is a sample of the error that gets thrown after a coupe of training iterations :
Ok
Epoch 0 Batch 5/91 - Train Accuracy: 0.4347, Validation Accuracy: 0.3557, Loss: 2.8656
++++Epoch 0 Batch 5/91 - Train WER: 1.0000, Validation WER: 1.0000
Epoch 0 Batch 10/91 - Train Accuracy: 0.4050, Validation Accuracy: 0.3864, Loss: 2.6347
++++Epoch 0 Batch 10/91 - Train WER: 1.0000, Validation WER: 1.0000
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-115-1d2a9495ad42> in <module>()
57 target_sequence_length: targets_lengths,
58 source_sequence_length: sources_lengths,
---> 59 keep_prob: keep_probability})
60
61
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1116 if final_fetches or final_targets or (handle and feed_dict_tensor):
1117 results = self._do_run(handle, final_targets, final_fetches,
-> 1118 feed_dict_tensor, options, run_metadata)
1119 else:
1120 results = []
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1313 if handle is None:
1314 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1315 options, run_metadata)
1316 else:
1317 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1332 except KeyError:
1333 pass
-> 1334 raise type(e)(node_def, op, message)
1335
1336 def _extend_graph(self):
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [1100,78] and labels shape [1400]
I'm not sure but I guess the GreedyEmbeddingHepler should not be used for training. , I would appreciate your help and thoughts on how to stop the teacher forcing.
thank you.
Upvotes: 4
Views: 2030
Reputation: 3666
There are different Helpers which all inherit from the same class. More information you can find in the documentation. As you said TrainingHelper
requires predefined true inputs which are expected to be outputted from the decoder and this true inputs are fed as next steps (instead of feeding the output of a previous step). This approach (by some research) should speed up training of decoder.
In your case, you are looking for GreedyEmbeddingHelper
. Just replace it instead of TrainingHelper
as:
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
embedding=embedding,
start_tokens=tf.tile([GO_SYMBOL], [batch_size]),
end_token=END_SYMBOL)
Just replace it with embedding
tensor and variables which you use in your problem. This helper automatically takes the output of a step applies embedding and feed it as input to next steps. For the first step is used the start_token
.
The resulting output by using GreedyEmbeddingHelper
doesn't have to match the length of expected output. You have to use padding to match their shapes. TensorFlow provides functiontf.pad()
. Also tf.contrib.seq2seq.dynamic_decode
returns tuple containing (final_outputs, final_state, final_sequence_lengths)
, so you can use value of final_sequece_lengths
for padding.
logits_pad = tf.pad(
logits,
[[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
[0, 0]],
constant_values=PAD_VALUE,
mode='CONSTANT')
targets_pad = tf.pad(
targets,
[[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],
constant_values=PAD_VALUE,
mode='CONSTANT')
You may have to change the padding a little bit depending on the shapes of your inputs. Also you don't have to pad the targets
if you set the maximum_iterations
parameter to match targets
shape.
Upvotes: 2