Reputation: 323
I often see natural language processing
tasks use LSTM
in a way that they first use a embedding layer
followed by an LSTM layer
of the size of the embedding, i.e. if a word is represented by a 1x300 vector LSTM(300)
is used.
E.g.:
model = Sequential()
model.add(Embedding(vocabulary, hidden_size, input_length=num_steps))
model.add(LSTM(hidden_size, return_sequences=True))
Is there a particular reason for doing so? Like a better representation of the meaning?
Upvotes: 1
Views: 158
Reputation: 11333
I don't think there's any special reason/need for this and frankly I haven't seen that many cases myself where this is the case (i.e. using the LSTM
hidden units == Embedding
size). The only effect this has is there's a single memory cell for each embedding vector element (which I don't think is a requirement or a necessity).
Having said that, I thought I might mention something extra. That is, there's a reason for having an Embedding
layer in this setup. In fact a very good reason(s). Let's consider the two options,
Embeddings
for word represetnationsOption 2 has several advantages over option 1.
Embedding
layer, even better. You are bringing word semantics in to the task you are solving. That always help to solve the task better. This is analogous to asking a toddler that doesn't understand the meaning of words to do something text-related (e.g. order words in the correct grammatical order) vs asking a 3-year old to do the same task. They both might eventually do it. But one will do it quicker and better.Upvotes: 1