Tharindu
Tharindu

Reputation: 310

ELMo Embedding layer with Keras

I have been using Keras default embedding layer with word embeddings in my architecture. Architecture looks like this -

left_input = Input(shape=(max_seq_length,), dtype='int32')
right_input = Input(shape=(max_seq_length,), dtype='int32')

embedding_layer = Embedding(len(embeddings), embedding_dim, weights=[embeddings], input_length=max_seq_length,
                            trainable=False)

# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden, name="lstm")

left_output = shared_lstm(encoded_left)
right_output = shared_lstm(encoded_right)

I want to replace the embedding layer with ELMo embeddings. So I used a custom embedding layer - found in this repo - https://github.com/strongio/keras-elmo/blob/master/Elmo%20Keras.ipynb. Embedding layer looks like this -

class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
    self.dimensions = 1024
    self.trainable=True
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
    self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                           name="{}_module".format(self.name))

    self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
    super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, x, mask=None):
    result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
                  as_dict=True,
                  signature='default',
                  )['default']
    return result

def compute_mask(self, inputs, mask=None):
    return K.not_equal(inputs, '--PAD--')

def compute_output_shape(self, input_shape):
    return (input_shape[0], self.dimensions)

I changed the architecture for the new embedding layer.

 # The visible layer
left_input = Input(shape=(1,), dtype="string")
right_input = Input(shape=(1,), dtype="string")

embedding_layer = ElmoEmbeddingLayer()

# Embedded version of the inputs
encoded_left = embedding_layer(left_input)
encoded_right = embedding_layer(right_input)

# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden, name="lstm")

left_output = shared_gru(encoded_left)
right_output = shared_gru(encoded_right)

But I am getting error -

ValueError: Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2

What am I doing wrong here?

Upvotes: 1

Views: 6883

Answers (2)

KMunro
KMunro

Reputation: 356

I also used that repository as a guide to build a CustomELMo + BiLSTM + CRF model, and I needed to change the dict lookup to 'elmo' instead of 'default'. As Anna Krogager pointed out, when the dict lookup is 'default' the output is (batch_size, dim), which isn't enough dimensions for the LSTM. However when the dict lookup is ['elmo'] the layer returns a tensor of the right dimensions, namely of shape (batch_size, max_length, 1024).

Custom ELMo Layer:

class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
    self.dimensions = 1024
    self.trainable = True
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
    self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                           name="{}_module".format(self.name))

    self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
    super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, x, mask=None):
    result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
                       as_dict=True,
                       signature='default',
                       )['elmo']
    print(result)
    return result

# def compute_mask(self, inputs, mask=None):
#   return K.not_equal(inputs, '__PAD__')

def compute_output_shape(self, input_shape):
    return input_shape[0], 48, self.dimensions

And the model is built as follows:

def build_model(): # uses crf from keras_contrib
    input = layers.Input(shape=(1,), dtype=tf.string)
    model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
    model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
    crf = CRF(num_tags)
    out = crf(model)
    model = Model(input, out)
    model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
    model.summary()
    return model

I hope my code is useful to you, even if it's not exactly the same model. Note that I had to comment out the compute_mask method as it throws

InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0]    [[{{node loss/crf_1_loss/mul_6}}]]

where 32 is batch size and 47 is one less than my specified max_length (presumably meaning it's accounting for a pad token itself). I haven't worked out the cause of that error yet, so it might be fine for you and your model. However I notice you're using GRU's, and there's an unresolved issue on the repository about adding GRU's. So I'm curious whether you get that isue too.

Upvotes: 3

Anna Krogager
Anna Krogager

Reputation: 3588

The Elmo embedding layer outputs one embedding per input (so the output shape is (batch_size, dim)) whereas your LSTM expects a sequence (i.e. shape (batch_size, seq_length, dim)). I don't think it makes much sense to have an LSTM layer after an Elmo embedding layer since Elmo already uses an LSTM to embed a sequence of words.

Upvotes: 2

Related Questions