MenorcanOrange
MenorcanOrange

Reputation: 2825

Keras- Embedding layer

What does input_dim, output_dim and input_length mean in:

Embedding(input_dim, output_dim, input_length)

From the documentation I understand:

So, when my input is a word like google.com each character represented by an integer [5, 2, 2, 5, 8, 3, 4, 1, 2, 9] and maximum word length possible is 75. Maximum characters possible is 38. How should I decide the input_dim, output_dim and input_length?

Upvotes: 9

Views: 7561

Answers (4)

Kamil
Kamil

Reputation: 335

I had a particular difficult time understanding the "output_dim" parameter but being a visual person I found this image helpful. Word embeddings transform the single integer values obtained from tokenizer into an n-dimensional array. For example, the word 'cat' might have the value '20' from tokenizer but keras's embedding layer can use all the words in your vocab to construct word embeddings to determine the relationships between all your words in your vocab (including 'cat'). It finds 'dimensions' or features such as "living being', 'feline', 'human', 'gender', etc. Then the word 'cat' has values for each dimension/feature. The "output_dim" parameter simply tells keras how many dimensions/features you wish to have in the embeddings matrix.

enter image description here

Upvotes: 0

It order to use words for natural language processing or machine learning tasks, it is necessary to first map them onto a continuous vector space, thus creating word vectors or word embeddings. The Keras Embedding layer is useful for constructing such word vectors.

input_dim : the vocabulary size. This is how many unique words are represented in your corpus.

output_dim : the desired dimension of the word vector. For example, if output_dim = 100, then every word will be mapped onto a vector with 100 elements, whereas if output_dim = 300, then every word will be mapped onto a vector with 300 elements.

input_length : the length of your sequences. For example, if your data consists of sentences, then this variable represents how many words there are in a sentence. As disparate sentences typically contain different number of words, it is usually required to pad your sequences such that all sentences are of equal length. The keras.preprocessing.pad_sequence method can be used for this (https://keras.io/preprocessing/sequence/).

In Keras, it is possible to either 1) use pretrained word vectors such as GloVe or word2vec representations, or 2) learn the word vectors as part of the training process. This blog post (https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) offers a tutorial on how to use GloVe pretrained word vectors. For option 2, Keras will randomly initialize vectors as the default option, and then learn optimal word vectors during the training process.

Upvotes: 14

Vaasha
Vaasha

Reputation: 961

  • input_dim: is the size of the vocabulary to embed
  • output_dim: is the length of the embedding vector
  • input_length: is the maximum length of the input (sentence)

As shown in Explain with example: how embedding layers in keras works you can turn a sentence into a list of integers (a vector or tensor). Example of vector with input_length (max lenght of the sentence being 6, in case your sentence is longer remaining words are trimmed)

 'This is a text' --> [0 0 1 2 3 4]
 'This is a very long text, my friends' --> [1 2 3 5 6 4]

Then using keras's embedding layer you can turn these vectors into embedding vectors of output_dim depth. For example output_dim = 3:

[0 0 1 2 3 4] --> 
array([[ 0.00251105,  0.00724941, -0.01146401],
   [ 0.00251105,  0.00724941, -0.01146401],
   [ 0.03071865,  0.00953215, -0.01349484],
   [ 0.02962008,  0.04860269, -0.04597988],
   [-0.01875228,  0.03349927, -0.03210936],
   [-0.02512982,  0.04811014,  0.03172458]], dtype=float32)

The last parameter input_dim is the size of the vocabulary mapped to embed vectors. You can see it by running

model.layers[0].get_weights() 

since embedding layer is usually first layer of the model. In case it was 10, embedding layer contain ten vectors of size of output_dim. Notice that the first element correspond to the mapping of 0 in the input vector (0 --> [ 0.00251105, 0.00724941, -0.01146401]), second of 1 etc.

[array([[ 0.00251105,  0.00724941, -0.01146401],
    [ 0.03071865,  0.00953215, -0.01349484],
    [ 0.02962008,  0.04860269, -0.04597988],
    [-0.01875228,  0.03349927, -0.03210936],
    [-0.02512982,  0.04811014,  0.03172458],
    [-0.00569617, -0.02348857, -0.00098624],
    [ 0.01327456,  0.02390958,  0.00754261],
    [-0.04041355,  0.03457253, -0.02879228],
    [-0.02695872,  0.02807242,  0.03338097],
    [-0.02057508,  0.00174383,  0.00792078]], dtype=float32)]

Increasing the input_dim allow you to map bigger vocabulary, but also increase number of parameters of the emdebbing layer. Number of parameters is input_dim x output_dim.

As far as I understood these vectors are initated randomly and trained as any other layer using optimizer's algorithm. You can however use different algorithms like word2vec or pretrained vectors like glove (https://nlp.stanford.edu/projects/glove/). Idea is that each word will represent a unique position in the space (described by it's vector) that you can apply some vector math on the word's semantics (meaning). E.g. W('cheesburger') - W('cheese') = W('hamburger') or W('prince') - W('man') + W('woman') = W('princess') see more e.g. on https://www.oreilly.com/learning/capturing-semantic-meanings-using-deep-learning

Upvotes: 7

zimmerrol
zimmerrol

Reputation: 4951

By taking a look at the keras documentation for the layer you see this:

Embedding(1000, 64, input_length=10)
#the model will take as input an integer matrix of size (batch, input_length).
#the largest integer (i.e. word index) in the input should be no larger than 999 (vocabulary size).
#now model.output_shape == (None, 10, 64), where None is the batch dimension.

By using the values you gave in your post you can try to grasp the idea of this method and can come up with this settings:

  • input_dim=38
  • input_length=75

while output_dim is a model parameter, which you still have to determine (and maybe have to try different values to find the optimal one).

Edit: You can find additional information about embedding layers here.

Upvotes: 2

Related Questions