miclat
miclat

Reputation: 241

Train only some word embeddings (Keras)

In my model, I use GloVe pre-trained embeddings. I wish to keep them non-trainable in order to decrease the number of model parameters and avoid overfit. However, I have a special symbol whose embedding I do want to train.

Using the provided Embedding Layer, I can only use the parameter 'trainable' to set the trainability of all embeddings in the following way:

embedding_layer = Embedding(voc_size,
                        emb_dim,
                        weights=[embedding_matrix],
                        input_length=MAX_LEN,
                        trainable=False)

Is there a Keras-level solution to training only a subset of embeddings?

Please note:

  1. There is not enough data to generate new embeddings for all words.
  2. These answers only relate to native TensorFlow.

Upvotes: 13

Views: 3491

Answers (2)

miclat
miclat

Reputation: 241

Found some nice workaround, inspired by Keith's two embeddings layers.

Main idea:

Assign the special tokens (and the OOV) with the highest IDs. Generate a 'sentence' containing only special tokens, 0-padded elsewhere. Then apply non-trainable embeddings to the 'normal' sentence, and trainable embeddings to the special tokens. Lastly, add both.

Works fine to me.

    # Normal embs - '+2' for empty token and OOV token
    embedding_matrix = np.zeros((vocab_len + 2, emb_dim))
    # Special embs
    special_embedding_matrix = np.zeros((special_tokens_len + 2, emb_dim))

    # Here we may apply pre-trained embeddings to embedding_matrix

    embedding_layer = Embedding(vocab_len + 2,
                        emb_dim,
                        mask_zero = True,
                        weights = [embedding_matrix],
                        input_length = MAX_SENT_LEN,
                        trainable = False)

    special_embedding_layer = Embedding(special_tokens_len + 2,
                            emb_dim,
                            mask_zero = True,
                            weights = [special_embedding_matrix],
                            input_length = MAX_SENT_LEN,
                            trainable = True)

    valid_words = vocab_len - special_tokens_len

    sentence_input = Input(shape=(MAX_SENT_LEN,), dtype='int32')

    # Create a vector of special tokens, e.g: [0,0,1,0,3,0,0]
    special_tokens_input = Lambda(lambda x: x - valid_words)(sentence_input)
    special_tokens_input = Activation('relu')(special_tokens_input)

    # Apply both 'normal' embeddings and special token embeddings
    embedded_sequences = embedding_layer(sentence_input)
    embedded_special = special_embedding_layer(special_tokens_input)

    # Add the matrices
    embedded_sequences = Add()([embedded_sequences, embedded_special])

Upvotes: 10

Keith
Keith

Reputation: 590

I haven't found a nice solution like a mask for the Embedding layer. But here's what I've been meaning to try:

  • Two embedding layers - one trainable and one not
  • The non-trainable one has all the Glove embeddings for in-vocab words and zero vectors for others
  • The trainable one only maps the OOV words and special symbols
  • The output of these two layers is added (I was thinking of this like ResNet)
  • The Conv/LSTM/etc below the embedding is unchanged

That would get you a solution with a small number of free parameters allocated to those embeddings.

Upvotes: 7

Related Questions