Reputation: 241
In my model, I use GloVe pre-trained embeddings. I wish to keep them non-trainable in order to decrease the number of model parameters and avoid overfit. However, I have a special symbol whose embedding I do want to train.
Using the provided Embedding Layer, I can only use the parameter 'trainable' to set the trainability of all embeddings in the following way:
embedding_layer = Embedding(voc_size,
emb_dim,
weights=[embedding_matrix],
input_length=MAX_LEN,
trainable=False)
Is there a Keras-level solution to training only a subset of embeddings?
Please note:
Upvotes: 13
Views: 3491
Reputation: 241
Found some nice workaround, inspired by Keith's two embeddings layers.
Main idea:
Assign the special tokens (and the OOV) with the highest IDs. Generate a 'sentence' containing only special tokens, 0-padded elsewhere. Then apply non-trainable embeddings to the 'normal' sentence, and trainable embeddings to the special tokens. Lastly, add both.
Works fine to me.
# Normal embs - '+2' for empty token and OOV token
embedding_matrix = np.zeros((vocab_len + 2, emb_dim))
# Special embs
special_embedding_matrix = np.zeros((special_tokens_len + 2, emb_dim))
# Here we may apply pre-trained embeddings to embedding_matrix
embedding_layer = Embedding(vocab_len + 2,
emb_dim,
mask_zero = True,
weights = [embedding_matrix],
input_length = MAX_SENT_LEN,
trainable = False)
special_embedding_layer = Embedding(special_tokens_len + 2,
emb_dim,
mask_zero = True,
weights = [special_embedding_matrix],
input_length = MAX_SENT_LEN,
trainable = True)
valid_words = vocab_len - special_tokens_len
sentence_input = Input(shape=(MAX_SENT_LEN,), dtype='int32')
# Create a vector of special tokens, e.g: [0,0,1,0,3,0,0]
special_tokens_input = Lambda(lambda x: x - valid_words)(sentence_input)
special_tokens_input = Activation('relu')(special_tokens_input)
# Apply both 'normal' embeddings and special token embeddings
embedded_sequences = embedding_layer(sentence_input)
embedded_special = special_embedding_layer(special_tokens_input)
# Add the matrices
embedded_sequences = Add()([embedded_sequences, embedded_special])
Upvotes: 10
Reputation: 590
I haven't found a nice solution like a mask for the Embedding layer. But here's what I've been meaning to try:
That would get you a solution with a small number of free parameters allocated to those embeddings.
Upvotes: 7