Reputation: 21
I am trying to understand LSTM on Deeplearning4j. I am examining source code for the example, but I can't understand this.
//Allocate space:
//Note the order here:
// dimension 0 = number of examples in minibatch
// dimension 1 = size of each vector (i.e., number of characters)
// dimension 2 = length of each time series/example
INDArray input = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);
INDArray labels = Nd4j.zeros(currMinibatchSize,validCharacters.length,exampleLength);
Why do we store 3D array, and what does it mean?
Upvotes: 1
Views: 1822
Reputation: 5151
Good question. But that has nothing to do with LSTM functioning, but has deal with task itself. So the task is to forecast, what will be the next character. Forecast of next character has two facets in itself: classification and approximation. If we have deal with approximation only, we can deal only with one dimension array. But if we deal with approximation and classification simultaneously, we can't feed into neural network only normalized ascii representation of characters. We need to transofrm each character into array.
For example a ( a not capital ) will be represented in this way:
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
b ( not capital ) will be represented as : 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 c will be represented as:
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Z (z capital !!!! )
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
so, each character gives us two dimensions array. How all of those dimensions were constructed? Code comment have following explanation:
// dimension 0 = number of examples in minibatch
// dimension 1 = size of each vector (i.e., number of characters)
// dimension 2 = length of each time series/example
I want sincerly commend you for your efforts in understanding how LSTM works, but the code which you pointed gives example which is applicable to all kinds of NN and explains how to work with text data in neural networks, but not explains how LSTM works. You need to see into another part of source code.
Upvotes: 1