Reputation: 3
How does a LSTM is able to extract and represent long sequences with data while using just one value (long-memory / LM) to maintain all this information?
If multiple value were used, it could be possible to the model - somehow - to differentiate each information that was processed by the model and internalized into memory, but using just one value it's become hard to understand how multiple information are representated. I mean, one iteration of LSTM could store into LM the same information of multiple iterations of LSTM.
If I think the LM as discrete quantity whose represent an different state for each binary combination stored in it - it would be comprehensive - but all ANN looks like working with an continuous approach while working with weights and biases and would be hard to produce the exact value that represent information with a model trained with a big dataset - it would be hard to conciliate an continuos function that could represent all these discrete representation with accuracy.
When I talk representation I don't mean necessarely word emmbed codified - like with enconder-decoder models - but a data in format that the model is able to understand, to process and differentiante, as it know it's meaning.
And, If LM is an continuos information it should be quantifying the intensity of some property, but to many contexts - like in sentences encoding - there isn't a clear quantity to be quantified.
Also, the LM are limited to providing as value to short memory, quantities between -1 and 1 - devide the tahn activation function - so - same considering an normalized output - it looks like a pretty limited output range to express so much information that could be injcted into the LSTM until that poit.
So, how the representation of data really work in LSTMs? I would really appreciate explanation and references to articles (and other texts) that could help me to understand.
Thanks!
Understand LSTM Representational Mechanism.
Upvotes: 0
Views: 14