How to set an initial state for a Bidirectional LSTM Layer in Keras?

I'm trying to set the initial state in an encoder which is composed of a Bidirectional LSTM Layer to 0's. However, if I input a single 0's matrix I get an error saying that a bidirectional layer has to be initialized with a list of tensors (makes sense). When I try to duplicate this 0's matrix into a list containing two of them (to initialize both RNNs), I get an error that the input shape is wrong. What am I missing here?

class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = keras.layers.Embedding(vocab_size, embedding_dim)
    self.lstmb = keras.layers.Bidirectional(lstm(self.enc_units, dropout=0.1))

def call(self, x, hidden):
    x = self.embedding(x)
    output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=[hidden, hidden])
    return output, forward_h, forward_c, backward_h, backward_c


def initialize_hidden_state(batch_sz, enc_units):
    return tf.zeros((batch_sz, enc_units))

The error I get is:

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(128, 512), ndim=2)]; however `cell.state_size` is [512, 512]

Note: the output of the function initialize_hidden_state is fed to the parameter hidden for the call function.

Upvotes: 2

Answers (6)

generalspoonful

Reputation: 81

@BCJuan has the right answer, but I had to make some changes to make it work:

def initialize_hidden_state(batch_sz, enc_units):
    init_state = [tf.zeros((batch_sz, enc_units)) for i in range(2)]
    return init_state

Very important: use tf.zeros not np.zeros since it is expecting a tf.tensor type.

If you are using a single LSTM layer in the Bidirectional wrapper, you need to return a list of 2 tf.tensors to init each RNN. One for the forward pass, and one for the backward pass.

Also, if you look at an example in TF's documentation, they use batch_sz and enc_units to specify the size of the hidden state.

Upvotes: 1

skyzip

Reputation: 255

Reading all of the comments and answers, I think I managed to create a working example.

But first some notes:

I think the call to self.lstmb will only return all five states if, you specify it in the LSTM's constructor.
I don't think you need to pass hidden state as a list of hidden states. You should just pass it as the initial state.

class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        # tell LSTM you want to get the states, and sequences returned
        self.lstmb = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(self.enc_units,
                                                                        return_sequences=True,
                                                                        return_state=True,
                                                                        dropout=0.1))

    def call(self, x, hidden):
        x = self.embedding(x)
        # no need to pass [hidden, hidden], just pass it as is
        output, forward_h, forward_c, backward_h, backward_c = self.lstmb(x, initial_state=hidden)
        return output, forward_h, forward_c, backward_h, backward_c


    def initialize_hidden_state(self):
        # I stole this idea from iamlcc, so the credit is not mine.
        return [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)]


encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, forward_h, forward_c, backward_h, backward_c = encoder(example_input_batch, sample_hidden)
print('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print('Encoder forward_h shape: (batch size, units) {}'.format(forward_h.shape))
print('Encoder forward_c shape: (batch size, units) {}'.format(forward_c.shape))
print('Encoder backward_h shape: (batch size, units) {}'.format(backward_h.shape))
print('Encoder backward_c shape: (batch size, units) {}'.format(backward_c.shape))

Upvotes: 2

T.C. Liu

Reputation: 321

I constructed my encoder with tf.keras.Model, and met the same error. this PR may help you. Finally I built my model by tf.keras.layers.layer, and I'm still working on it. I'll update after I success!

Upvotes: 0

yunchan

Reputation: 369

If it's not too late, I think your initialize_hidden_state function should be:

def initialize_hidden_state(self): init_state = [tf.zeros((self.batch_sz, self.enc_units)) for i in range(4)] return init_state

Upvotes: 1

OmriP

Reputation: 93

I ended up not using the bidirectional wrapper, and just create 2 LSTM layers with one of them receiving the parameter go_backwards=True and concatenating the outputs, if it helps anyone. I think the bidirectional Keras wrapper can't handle this sort of thing at the moment.

Upvotes: 0

BCJuan

Reputation: 825

You are inputting a state size of (batch_size, hidden_units) and you should input a state with size (hidden_units, hidden_units). Also it has to have 4 initial states: 2 for the 2 lstm states and 2 more becuase you have one forward and one backward pass due to the bidirectional.

Try and change this:

def initialize_hidden_state(batch_sz, enc_units):
    return tf.zeros((batch_sz, enc_units))

def initialize_hidden_state(enc_units, enc_units):
    init_state = [np.zeros((enc_units, enc_units)) for i in range(4)]
    return init_state

Hope this helps

Upvotes: 1

How to set an initial state for a Bidirectional LSTM Layer in Keras?

Answers (6)

Related Questions