Andrey Gurevich
Andrey Gurevich

Reputation: 1

How to set the input for LSTM in Keras

I'm new to Keras, and I find it hard to understand the shape of input data of the LSTM layer.The Keras Documentation says that the input data should be 3D tensor with shape (nb_samples, timesteps, input_dim). I'm having trouble of understanding this format. Does the timesteps variable represent the number of timesteps the network remembers?

In my data a few time steps affect the output of the network but I do not know how many in advance i.e. I can't say that the previous 10 samples affect the output. For example the input can be words that form sentences. There is an important correlation between the words in each sentence. I don't know the length of the sentence in advance, this length also vary from one sentence to another. I do know when the sentence ends (i.e. i have a period that indicates the ending). Two different sentences has no affect one on the other - there is no need to remember the previous sentence.

I'm using the LSTM network for learning a policy in reinforcement learning, so I don't have a fixed data set. The agent's policy will change the length of the sentence.

How should I shape my data? How should it be fed into the Keras LSTM layer?

Upvotes: 0

Views: 941

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86600

Time steps is the total length of your sequence.

If you're working with words, it's the amount of words of each sentence.
If you're working with chars, it's the amount of chars of each sequence.

In a variable sentence length case, you should set that dimension to None:

#for functional API models:
inputTensor = Input((None,input_dim)) #the nb_samples doesn't participate in this definition

#for sequential models:
LSTM(units, input_shape=(None,input_dim)) #the nb_samples doesn't participate in this definition

There are two possible ways of working with variable lenghts in keras.

  • Fixed length with padding
  • Variable length separated in batches with same length

In the fixed length case, you create a dummy word/character that is meaningless, and fill your sentences to a maximum length, so all sentences have the same length. Then you add a Masking() layer that will ignore that dummy word/char.

The Embedding layers already have a mask_zeros parameter, then, if working with embeddings, you can make the id 0 be a dummy char/word.

In the variable length, you just separate your input data in smaller batches, like here: Keras misinterprets training data shape

Upvotes: 2

Related Questions