Reputation: 779
Which algorithm is used for embedding in Keras built-in function? Word2vec? Glove? Other?
https://keras.io/layers/embeddings/
Upvotes: 3
Views: 1829
Reputation: 179
The short answer is neither. In essence, an embedding layer such as Word2Vec of GloVe is just a small neural network module (fully-connected layer usually) that projects higher, sparse dimensionality into a lower, n-dimensional vector.
When you insert a fresh random embedding layer in Keras into your neural network, Keras will construct a dense learnable matrix of shape [input_dim, output_dim]
.
Concretely, let's say that you're inserting an Embedding layer to encode integer scalar month information (12 unique values) into a float vector of size 3. In Keras, you're going to declare your embedding as follows:
import numpy as np
import keras
from keras.models import Sequential, Model
from keras.layers import Embedding, Input
x = Input(shape=(1000,)) # suppose seq_len=1000
embedding = Embedding(12+1, 3, input_length=1000)(x)
model = Model(inputs=x, outputs= embedding) # Functional API
model.summary()
Your embedding layer would have a summary as follows:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1000) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 1000, 3) 39
=================================================================
Total params: 39
Trainable params: 39
Non-trainable params: 0
_________________________________________________________________
Notice that the learnable parameters are 39 = 13*3
(the +1 is needed by Keras to encode values that don't belong to any of the 12 unique months - just in case).
Also notice that while the input shape to embedding is shaped (None, 1000)
, the output of the embedding is shaped (None, 1000, 3)
. This means the very small dense weight matrix of size [13, 3]
is applied to each of the 1000 input time-steps. Which means, every month integer input of 0-11
will be converted into a float vector of size (3,)
.
This also means that when you do backpropagation from the final layer into the embedding layer, the gradient to each of the 1000 time-steps embedding output will also flow (in a time_distributed
manner) to the small neural network weights (which is, essentially, the embedding layer) of size [13,3]
.
Please also refer to official Keras documentation for Embedding layer: https://keras.io/layers/embeddings/.
Upvotes: 7