Reputation: 93
I'm trying to set the initial state in an encoder which is composed of a Bidirectional LSTM
Layer to 0's. However, if I input a single 0's matrix I get an error saying that a bidirectional layer has to be initialized with a list of tensors (makes sense). When I try to duplicate this 0's matrix into a list containing two of them (to initialize both RNNs
), I get an error that the input shape is wrong. What am I missing here?
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
self.lstmb = keras.layers.Bidirectional(lstm(self.enc_units, dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=[hidden, hidden])
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
The error I get is:
ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(128, 512), ndim=2)]; however `cell.state_size` is [512, 512]
Note: the output of the function initialize_hidden_state
is fed to the parameter hidden
for the call function.
Upvotes: 2
Views: 6809
Reputation: 81
@BCJuan has the right answer, but I had to make some changes to make it work:
def initialize_hidden_state(batch_sz, enc_units):
init_state = [tf.zeros((batch_sz, enc_units)) for i in range(2)]
return init_state
Very important: use tf.zeros
not np.zeros
since it is expecting a tf.tensor type.
If you are using a single LSTM layer in the Bidirectional wrapper, you need to return a list of 2 tf.tensors to init each RNN. One for the forward pass, and one for the backward pass.
Also, if you look at an example in TF's documentation, they use batch_sz
and enc_units
to specify the size of the hidden state.
Upvotes: 1
Reputation: 255
Reading all of the comments and answers, I think I managed to create a working example.
But first some notes:
self.lstmb
will only return all five states if, you specify it in the LSTM's constructor.class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
# tell LSTM you want to get the states, and sequences returned
self.lstmb = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(self.enc_units,
return_sequences=True,
return_state=True,
dropout=0.1))
def call(self, x, hidden):
x = self.embedding(x)
# no need to pass [hidden, hidden], just pass it as is
output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=hidden)
return output, forward_h, forward_c, backward_h, backward_c
def initialize_hidden_state(self):
# I stole this idea from iamlcc, so the credit is not mine.
return [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, forward_h, forward_c, backward_h, backward_c = encoder(example_input_batch, sample_hidden)
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder forward_h shape: (batch size, units) {}'.format(forward_h.shape))
print('Encoder forward_c shape: (batch size, units) {}'.format(forward_c.shape))
print('Encoder backward_h shape: (batch size, units) {}'.format(backward_h.shape))
print('Encoder backward_c shape: (batch size, units) {}'.format(backward_c.shape))
Upvotes: 2
Reputation: 321
I constructed my encoder with tf.keras.Model, and met the same error. this PR may help you. Finally I built my model by tf.keras.layers.layer, and I'm still working on it. I'll update after I success!
Upvotes: 0
Reputation: 369
If it's not too late, I think your initialize_hidden_state
function should be:
def initialize_hidden_state(self):
init_state = [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]
return init_state
Upvotes: 1
Reputation: 93
I ended up not using the bidirectional wrapper, and just create 2 LSTM
layers with one of them receiving the parameter go_backwards=True
and concatenating the outputs, if it helps anyone.
I think the bidirectional Keras wrapper can't handle this sort of thing at the moment.
Upvotes: 0
Reputation: 825
You are inputting a state size of (batch_size, hidden_units)
and you should input a state with size (hidden_units, hidden_units)
. Also it has to have 4 initial states: 2 for the 2 lstm states and 2 more becuase you have one forward and one backward pass due to the bidirectional.
Try and change this:
def initialize_hidden_state(batch_sz, enc_units):
return tf.zeros((batch_sz, enc_units))
To
def initialize_hidden_state(enc_units, enc_units):
init_state = [np.zeros((enc_units, enc_units)) for i in range(4)]
return init_state
Hope this helps
Upvotes: 1