Reputation: 1
I'm new to Keras, and I find it hard to understand the shape of input data of the LSTM layer.The Keras Documentation says that the input data should be 3D tensor with shape (nb_samples, timesteps, input_dim). I'm having trouble of understanding this format. Does the timesteps variable represent the number of timesteps the network remembers?
In my data a few time steps affect the output of the network but I do not know how many in advance i.e. I can't say that the previous 10 samples affect the output. For example the input can be words that form sentences. There is an important correlation between the words in each sentence. I don't know the length of the sentence in advance, this length also vary from one sentence to another. I do know when the sentence ends (i.e. i have a period that indicates the ending). Two different sentences has no affect one on the other - there is no need to remember the previous sentence.
I'm using the LSTM network for learning a policy in reinforcement learning, so I don't have a fixed data set. The agent's policy will change the length of the sentence.
How should I shape my data? How should it be fed into the Keras LSTM layer?
Upvotes: 0
Views: 941
Reputation: 86600
Time steps is the total length of your sequence.
If you're working with words, it's the amount of words of each sentence.
If you're working with chars, it's the amount of chars of each sequence.
In a variable sentence length case, you should set that dimension to None
:
#for functional API models:
inputTensor = Input((None,input_dim)) #the nb_samples doesn't participate in this definition
#for sequential models:
LSTM(units, input_shape=(None,input_dim)) #the nb_samples doesn't participate in this definition
There are two possible ways of working with variable lenghts in keras.
In the fixed length case, you create a dummy word/character that is meaningless, and fill your sentences to a maximum length, so all sentences have the same length. Then you add a Masking()
layer that will ignore that dummy word/char.
The Embedding
layers already have a mask_zeros
parameter, then, if working with embeddings, you can make the id 0 be a dummy char/word.
In the variable length, you just separate your input data in smaller batches, like here: Keras misinterprets training data shape
Upvotes: 2