Implementing attention with beam search in tensorflow

Question

I have written my own code with reference to this wonderful tutorial and I am not able to get results when using attention with beam search as per my understanding in the class AttentionModel the _build_decoder_cell function creates separate decoder cell and attention wrapper for inference mode , assuming this ( which i think is incorrect and cant find a way around it ) ,

with tf.name_scope("Decoder"):

mem_units = 2*dim
dec_cell = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_cel = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_width = 3
out_layer = Dense( output_vocab_size )

with tf.name_scope("Training"):
    attn_mech = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units,  memory = enc_rnn_out, normalize=True)
    attn_cell = tf.contrib.seq2seq.AttentionWrapper( cell = dec_cell,attention_mechanism = attn_mech ) 

    batch_size = tf.shape(enc_rnn_out)[0]
    initial_state = attn_cell.zero_state( batch_size = batch_size , dtype=tf.float32 )
    initial_state = initial_state.clone(cell_state = enc_rnn_state)

    helper = tf.contrib.seq2seq.TrainingHelper( inputs = emb_x_y , sequence_length = seq_len )
    decoder = tf.contrib.seq2seq.BasicDecoder( cell = attn_cell, helper = helper, initial_state = initial_state ,output_layer=out_layer ) 
    outputs, final_state, final_sequence_lengths= tf.contrib.seq2seq.dynamic_decode(decoder=decoder,impute_finished=True)

    training_logits = tf.identity(outputs.rnn_output )
    training_pred = tf.identity(outputs.sample_id )

with tf.name_scope("Inference"):

    enc_rnn_out_beam   = tf.contrib.seq2seq.tile_batch( enc_rnn_out   , beam_width )
    seq_len_beam       = tf.contrib.seq2seq.tile_batch( seq_len       , beam_width )
    enc_rnn_state_beam = tf.contrib.seq2seq.tile_batch( enc_rnn_state , beam_width )

    batch_size_beam      = tf.shape(enc_rnn_out_beam)[0]   # now batch size is beam_width times

    # start tokens mean be the original batch size so divide
    start_tokens = tf.tile(tf.constant([27], dtype=tf.int32), [ batch_size_beam//beam_width ] )
    end_token = 0

    attn_mech_beam = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units,  memory = enc_rnn_out_beam, normalize=True)
    cell_beam = tf.contrib.seq2seq.AttentionWrapper(cell=beam_cel,attention_mechanism=attn_mech_beam,attention_layer_size=mem_units)  

    initial_state_beam = cell_beam.zero_state(batch_size=batch_size_beam,dtype=tf.float32).clone(cell_state=enc_rnn_state_beam)

    my_decoder = tf.contrib.seq2seq.BeamSearchDecoder( cell = cell_beam,
                                                       embedding = emb_out,
                                                       start_tokens = start_tokens,
                                                       end_token = end_token,
                                                       initial_state = initial_state_beam,
                                                       beam_width = beam_width
                                                       ,output_layer=out_layer)

    beam_output, t1 , t2 = tf.contrib.seq2seq.dynamic_decode(  my_decoder,
                                                                maximum_iterations=maxlen )

    beam_logits = tf.no_op()
    beam_sample_id = beam_output.predicted_ids

when i call beam _sample_id after training i am not getting correct result.

my guess is that we are supposed to using the same attention wrapper but that is not possible since we have to tile_sequence for beam search to be used.

Any insights / suggestion would be much appreciated.

i have also created an issue for this in their main repository Issue-93

mousa alsulaimi · Accepted Answer

I'm not sure what do you mean by "I am not able to get results" but I'm assuming that your model is not making use of the wieghts learnt while training .

if this is the case , then first of all you need to know that its all about variable sharing , the first thing you need to do is that you get rid of the different variable scopes between the training and inferring and instead you need to use some thing like

remove the

with tf.name_scope("Training"):

and use :

with tf.variable_scope("myScope"):

and then remove the

with tf.name_scope("Inference"):

and use instead

with tf.variable_scope("myScope" , reuse=True):

also at the beginning of your and after with tf.variable_scope("myScope" )

enc_rnn_out   = tf.contrib.seq2seq.tile_batch( enc_rnn_out   , 1 )
seq_len       = tf.contrib.seq2seq.tile_batch( seq_len       , 1 )
enc_rnn_state = tf.contrib.seq2seq.tile_batch( enc_rnn_state , 1 )

this will insure that your inference variables and training variables have the same signature and are shared ,

I have tested this when I was following the same tutorial that you have mentioned , my model is still training as I'm writing this post but I can see that the accuracy increasing as we speak , which indicates that the solution should work for you as well .

thank you

Implementing attention with beam search in tensorflow

Answers (2)

Related Questions