Shamane Siriwardhana
Shamane Siriwardhana

Reputation: 4201

When training seq2seq model with bucketing method do we keep separate RNNs for each bucket?

Let's say we have 3 buckets of different lengths. So do we train 3 different nets? Can't we keep a dynamic RNN. Where it will add units according to the length of input sequence in the encoder. Then encoder will pass the last hidden state to the decoder. Will it work?

Upvotes: 1

Views: 449

Answers (1)

Shamane Siriwardhana
Shamane Siriwardhana

Reputation: 4201

I went through this. Bucketing is help to speed up the training process. We first divide examples in to buckets. Then we can reduce the number of units we have to pad. In the training iterations we select one bucket at a time and train the whole network. In the validation we check the perplexity of the test examples in each bucket. Tensorflow support this bucketing.

Dynamic RNN is different here we don't have a bucketing mechanism. It we input data as a tensor with the shape of [batch_size,hidden size,max_seq_length].

Here sequences shorter than the maximum length should padded with zeros.

Then it will create dynamic RNNs that their length equal to actual inputs (without padded zeros). This uses a while loop in the tensorflow.

Upvotes: 1

Related Questions