LSTM num_units size, ie size of hidden_layer

Question

I often see natural language processing tasks use LSTM in a way that they first use a embedding layer followed by an LSTM layer of the size of the embedding, i.e. if a word is represented by a 1x300 vector LSTM(300) is used.

E.g.:

model = Sequential()
model.add(Embedding(vocabulary, hidden_size, input_length=num_steps))
model.add(LSTM(hidden_size, return_sequences=True))

Is there a particular reason for doing so? Like a better representation of the meaning?

thushv89 · Accepted Answer

I don't think there's any special reason/need for this and frankly I haven't seen that many cases myself where this is the case (i.e. using the LSTM hidden units == Embedding size). The only effect this has is there's a single memory cell for each embedding vector element (which I don't think is a requirement or a necessity).

Having said that, I thought I might mention something extra. That is, there's a reason for having an Embedding layer in this setup. In fact a very good reason(s). Let's consider the two options,

Using one hot encoding for word representation
Using Embeddings for word represetnations

Option 2 has several advantages over option 1.

The dimensionality of the inputs is way smaller when you use an embedding layer (e.g. 300 as opposed to 50000)
You're providing the flexibility to the model to learn a word representation that is infact suited to the task your solving. In other words, you are not restricting the representation of words to remain constant during the training process.
If you use a pretrained word embedding layer to initialize the Embedding layer, even better. You are bringing word semantics in to the task you are solving. That always help to solve the task better. This is analogous to asking a toddler that doesn't understand the meaning of words to do something text-related (e.g. order words in the correct grammatical order) vs asking a 3-year old to do the same task. They both might eventually do it. But one will do it quicker and better.

LSTM num_units size, ie size of hidden_layer

Answers (1)

Related Questions