sachinruk
sachinruk

Reputation: 9869

Concatenate Embedding layers

I'm trying to create a model which has words as inputs. Most of those words are in the glove word vector set (~50000). However, some of the frequent words are not (~1000). The question is, how do I concatenate the following two embedding layers to create one giant Embedding lookup table?

trained_em = Embedding(50000, 50, 
                       weights=np.array([word2glove[w] for w in words_in_glove]), 
                       trainable=False)
untrained_em = Embedding(1000, 50)

As far as I understand these are simply two lookup tables with same number of dimensions. So I'm hoping that there is a way to stack these two lookup tables.

Edit 1: I just realised that this is probably going to be more than stacking Embedding layers because the input sequence would be a number from 0-50999. However untrained_em above only expect a number from 0-999. So perhaps a different solution is required.

Edit 2: This is what I would expect to do in a numpy array representing the Embedding:

np.random.seed(42) # Set seed for reproducibility
pretrained = np.random.randn(15,3)
untrained = np.random.randn(5,3)
final_embedding = np.vstack([pretrained, untrained])

word_idx = [2, 5, 19]
np.take(final_embedding, word_idx, axis=0)

I believe the last bit can be done with something to do with keras.backend.gather but not sure how to put it all together.

Upvotes: 1

Views: 3436

Answers (2)

sachinruk
sachinruk

Reputation: 9869

Turns out that I need to implement a custom layer. Which was implemented by tweaking the orignial Embedding class.

The two most important parts shown in the class below are self.embeddings = K.concatenate([fixed_weight, variable_weight], axis=0) and out = K.gather(self.embeddings, inputs). The first is hopefully self explanatory while the second picks out the relevant input rows from the embeddings table.

However, in the particular application that I'm working on, turns out that it works out better using an Embedding layer instead of the modified layer. Perhaps because the learning rate is too high. I will report back on this after I have experimented more.

from keras.engine.topology import Layer
import keras.backend as K
from keras import initializers
import numpy as np

class Embedding2(Layer):

    def __init__(self, input_dim, output_dim, fixed_weights, embeddings_initializer='uniform', 
                 input_length=None, **kwargs):
        kwargs['dtype'] = 'int32'
        if 'input_shape' not in kwargs:
            if input_length:
                kwargs['input_shape'] = (input_length,)
            else:
                kwargs['input_shape'] = (None,)
        super(Embedding2, self).__init__(**kwargs)

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.embeddings_initializer = embeddings_initializer
        self.fixed_weights = fixed_weights
        self.num_trainable = input_dim - len(fixed_weights)
        self.input_length = input_length

    def build(self, input_shape, name='embeddings'):
        initializer = initializers.get(self.embeddings_initializer)
        shape1 = (self.num_trainable, self.output_dim)
        variable_weight = K.variable(initializer(shape1), dtype=K.floatx(), name=name+'_var')

        fixed_weight = K.variable(self.fixed_weights, name=name+'_fixed')


        self._trainable_weights.append(variable_weight)
        self._non_trainable_weights.append(fixed_weight)

        self.embeddings = K.concatenate([fixed_weight, variable_weight], axis=0)

        self.built = True

    def call(self, inputs):
        if K.dtype(inputs) != 'int32':
            inputs = K.cast(inputs, 'int32')
        out = K.gather(self.embeddings, inputs)
        return out

    def compute_output_shape(self, input_shape):
        if not self.input_length:
            input_length = input_shape[1]
        else:
            input_length = self.input_length
        return (input_shape[0], input_length, self.output_dim)

Upvotes: 3

Daniel Möller
Daniel Möller

Reputation: 86650

So, my suggestion is to use only one Embedding layer (taking into consideration your indexing problem), and transfer the weights from the old layer to the new one.

So, what you're going to to in this suggestion is...

Create your new model with 51000 words:

inp = Input((1,)) 
emb = Embedding(51000,50)(inp)
out = the rest of the model.....

model = Model(inp,out)

Now take the embedding layer and give it the weights you had:

weights = np.array([word2glove[w] for w in words_in_glove])

newWeights = model.layers[1].get_weights()[0]
newWeights[:50000,:] = weights
model.layers[1].set_weights([newWeights])

This will give you a new embedding, larger than the previous one, with a great part of its weights already trained, and the remaining randomly initialized.

Unfortunately, you will have to let everything be trained.

Upvotes: 1

Related Questions