Piyank Sarawagi
Piyank Sarawagi

Reputation: 45

Implementing attention with beam search in tensorflow

I have written my own code with reference to this wonderful tutorial and I am not able to get results when using attention with beam search as per my understanding in the class AttentionModel the _build_decoder_cell function creates separate decoder cell and attention wrapper for inference mode , assuming this ( which i think is incorrect and cant find a way around it ) ,

with tf.name_scope("Decoder"):

mem_units = 2*dim
dec_cell = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_cel = tf.contrib.rnn.BasicLSTMCell( 2*dim )
beam_width = 3
out_layer = Dense( output_vocab_size )

with tf.name_scope("Training"):
    attn_mech = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units,  memory = enc_rnn_out, normalize=True)
    attn_cell = tf.contrib.seq2seq.AttentionWrapper( cell = dec_cell,attention_mechanism = attn_mech ) 

    batch_size = tf.shape(enc_rnn_out)[0]
    initial_state = attn_cell.zero_state( batch_size = batch_size , dtype=tf.float32 )
    initial_state = initial_state.clone(cell_state = enc_rnn_state)

    helper = tf.contrib.seq2seq.TrainingHelper( inputs = emb_x_y , sequence_length = seq_len )
    decoder = tf.contrib.seq2seq.BasicDecoder( cell = attn_cell, helper = helper, initial_state = initial_state ,output_layer=out_layer ) 
    outputs, final_state, final_sequence_lengths= tf.contrib.seq2seq.dynamic_decode(decoder=decoder,impute_finished=True)

    training_logits = tf.identity(outputs.rnn_output )
    training_pred = tf.identity(outputs.sample_id )

with tf.name_scope("Inference"):

    enc_rnn_out_beam   = tf.contrib.seq2seq.tile_batch( enc_rnn_out   , beam_width )
    seq_len_beam       = tf.contrib.seq2seq.tile_batch( seq_len       , beam_width )
    enc_rnn_state_beam = tf.contrib.seq2seq.tile_batch( enc_rnn_state , beam_width )

    batch_size_beam      = tf.shape(enc_rnn_out_beam)[0]   # now batch size is beam_width times

    # start tokens mean be the original batch size so divide
    start_tokens = tf.tile(tf.constant([27], dtype=tf.int32), [ batch_size_beam//beam_width ] )
    end_token = 0

    attn_mech_beam = tf.contrib.seq2seq.BahdanauAttention( num_units = mem_units,  memory = enc_rnn_out_beam, normalize=True)
    cell_beam = tf.contrib.seq2seq.AttentionWrapper(cell=beam_cel,attention_mechanism=attn_mech_beam,attention_layer_size=mem_units)  

    initial_state_beam = cell_beam.zero_state(batch_size=batch_size_beam,dtype=tf.float32).clone(cell_state=enc_rnn_state_beam)

    my_decoder = tf.contrib.seq2seq.BeamSearchDecoder( cell = cell_beam,
                                                       embedding = emb_out,
                                                       start_tokens = start_tokens,
                                                       end_token = end_token,
                                                       initial_state = initial_state_beam,
                                                       beam_width = beam_width
                                                       ,output_layer=out_layer)

    beam_output, t1 , t2 = tf.contrib.seq2seq.dynamic_decode(  my_decoder,
                                                                maximum_iterations=maxlen )

    beam_logits = tf.no_op()
    beam_sample_id = beam_output.predicted_ids

when i call beam _sample_id after training i am not getting correct result.

my guess is that we are supposed to using the same attention wrapper but that is not possible since we have to tile_sequence for beam search to be used.

Any insights / suggestion would be much appreciated.

i have also created an issue for this in their main repository Issue-93

Upvotes: 4

Views: 3496

Answers (2)

allenyllee
allenyllee

Reputation: 1074

You can use tf.cond() to create different pathes between training and inference stage:

def get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch):
    enc_output = tf.contrib.seq2seq.tile_batch(enc_output, multiplier=useBeamSearch)
    source_sequence_length = tf.contrib.seq2seq.tile_batch(source_sequence_length, multiplier=useBeamSearch)
    enc_state = tf.contrib.seq2seq.tile_batch(enc_state, multiplier=useBeamSearch)
    return enc_output, source_sequence_length, enc_state

## for beam search: at training stage, use tile_batch multiplier = 1,
## at infer stage, use tile_batch multiplier = useBeamSearch
## tile_batch is just duplicate every sample in a batch,
## so it'll change batch_size to batch_size * useBeamSearch at runtime once batch_size was determined
enc_output, source_sequence_length, enc_state = tf.cond(
    self.on_infer, # is inference stage?
    lambda: get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch=useBeamSearch),
    lambda: get_tile_batch(enc_output, source_sequence_length, enc_state, useBeamSearch=1)
)

# attention mechanism
attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units=rnn_size, memory=enc_output, memory_sequence_length=source_sequence_length)
dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell, attention_mechanism)
## for beam search: change batch_size to batch_size * useBeamSearch at infer stage
decoder_initial_state = tf.cond(
    self.on_infer, # is inference stage?
    lambda: dec_cell.zero_state(batch_size=batch_size * useBeamSearch, dtype=tf.float32),
    lambda: dec_cell.zero_state(batch_size=batch_size * 1, dtype=tf.float32)
)
enc_state = decoder_initial_state.clone(cell_state=enc_state)

Upvotes: 0

mousa alsulaimi
mousa alsulaimi

Reputation: 336

I'm not sure what do you mean by "I am not able to get results" but I'm assuming that your model is not making use of the wieghts learnt while training .

if this is the case , then first of all you need to know that its all about variable sharing , the first thing you need to do is that you get rid of the different variable scopes between the training and inferring and instead you need to use some thing like

remove the

with tf.name_scope("Training"):

and use :

with tf.variable_scope("myScope"):

and then remove the

with tf.name_scope("Inference"):

and use instead

with tf.variable_scope("myScope" , reuse=True):

also at the beginning of your and after with tf.variable_scope("myScope" )

enc_rnn_out   = tf.contrib.seq2seq.tile_batch( enc_rnn_out   , 1 )
seq_len       = tf.contrib.seq2seq.tile_batch( seq_len       , 1 )
enc_rnn_state = tf.contrib.seq2seq.tile_batch( enc_rnn_state , 1 )

this will insure that your inference variables and training variables have the same signature and are shared ,

I have tested this when I was following the same tutorial that you have mentioned , my model is still training as I'm writing this post but I can see that the accuracy increasing as we speak , which indicates that the solution should work for you as well .

thank you

Upvotes: 2

Related Questions