Embedding in Keras

Question

Which algorithm is used for embedding in Keras built-in function? Word2vec? Glove? Other?

Andri D · Accepted Answer

The short answer is neither. In essence, an embedding layer such as Word2Vec of GloVe is just a small neural network module (fully-connected layer usually) that projects higher, sparse dimensionality into a lower, n-dimensional vector.

When you insert a fresh random embedding layer in Keras into your neural network, Keras will construct a dense learnable matrix of shape [input_dim, output_dim].

Concretely, let's say that you're inserting an Embedding layer to encode integer scalar month information (12 unique values) into a float vector of size 3. In Keras, you're going to declare your embedding as follows:

import numpy as np
import keras
from keras.models import Sequential, Model
from keras.layers import Embedding, Input
x = Input(shape=(1000,)) # suppose seq_len=1000
embedding = Embedding(12+1, 3, input_length=1000)(x)
model = Model(inputs=x, outputs= embedding) # Functional API
model.summary()

Your embedding layer would have a summary as follows:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 1000)              0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 1000, 3)           39        
=================================================================
Total params: 39
Trainable params: 39
Non-trainable params: 0
_________________________________________________________________

Notice that the learnable parameters are 39 = 13*3 (the +1 is needed by Keras to encode values that don't belong to any of the 12 unique months - just in case).

Also notice that while the input shape to embedding is shaped (None, 1000), the output of the embedding is shaped (None, 1000, 3). This means the very small dense weight matrix of size [13, 3] is applied to each of the 1000 input time-steps. Which means, every month integer input of 0-11 will be converted into a float vector of size (3,).

This also means that when you do backpropagation from the final layer into the embedding layer, the gradient to each of the 1000 time-steps embedding output will also flow (in a time_distributed manner) to the small neural network weights (which is, essentially, the embedding layer) of size [13,3].

Please also refer to official Keras documentation for Embedding layer: https://keras.io/layers/embeddings/.

Embedding in Keras

Answers (2)

Related Questions