seeker89
seeker89

Reputation: 9

How to pass bert embeddings to an LSTM layer

I want to do sentiment analysis using bert-embedding and lstm layer. This is my code:

i = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
x = bert_preprocess(i)
x = bert_encoder(x)
x = tf.keras.layers.Dropout(0.2, name="dropout")(x['pooled_output'])
x = tf.keras.layers.LSTM(128, dropout=0.2)(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
x = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(x)

model = tf.keras.Model(i, x)

When compiling this code I got the following error:

ValueError: Input 0 of layer "lstm_2" is incompatible with the layer: expected 
ndim=3, found ndim=2. Full shape received: (None, 768)

Is the logic of my code correct? Can anyone please correct my code?

Upvotes: 0

Views: 4184

Answers (1)

Deepak Sadulla
Deepak Sadulla

Reputation: 373

From bert like models you can expect generally three kinds of outputs (taken from huggingface's TFBertModel documentation)

  • last_hidden_state with shape (batch_size, sequence_length, hidden_size)
  • pooler_output with shape (batch_size, hidden_size)
  • hidden_states with shape (batch_size, sequence_length, hidden_size)

hidden_size is 768 above..

As the error says, the output from dropout layer lacks 3 dimensions (essentially the bert_encoder layer because dropout layers do not change tensor shape) and has only 2 dimensions.

x = bert_encoder(x)
x = tf.keras.layers.Dropout(0.2, name="dropout")(x['pooled_output'])
x = tf.keras.layers.LSTM(128, dropout=0.2)(x)

So if you are planning to use an LSTM layer after the bert_encoder layer, you would need a three dimensional input to the LSTM in the form of (batch_size, num_timesteps, num_features) hence you would have to use either the hidden_states or the last_hidden_state outputs instead of pooler_output. You will have to choose between the two depending on your objective/use-case.

Upvotes: 1

Related Questions