Getting state of predictions in LSTMs

Question

I am attempting to generate shakespeare text using the following model:

model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

The training set consists of characters converted to numbers. Where x is of shape (num_sentences, sentence_len) and same shape for y, where y is simply x offset by one character. In this case sentence_len=40.

However, when I predict I predict one character at a time. See below for how I fit and predict using the model:

for i in range(2):
    model.fit(x,y, batch_size=128, epochs=1)

    sentence = []
    letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
    for i in range(100):
        sentence.append(val2chr(letter))
        # Predict ONE letter at a time
        p = model.predict(letter)
        letter = np.random.choice(27,1,p=p[0][0])
    print(''.join(sentence))

However, regardless of how many epochs I train all I get is jibberish for the output. One of the possible reasons is that I do not get the cell memory from the previous prediction.

So the question is how do I make sure that the state is sent off to the next cell before I predict?

Full jupyter notebook example is here:

Edit 1:

I just realised that I would need to send in the previous LSTMs hidden state and not just cell memory. I have since tried to redo the model as:

batch_size = 64

model = Sequential()
model.add(Embedding(len_vocab, 64, batch_size=batch_size))
model.add(LSTM(256, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

However, now I cannot predict one letter at a time as it is expecting a batch_size of inputs.

sachinruk · Accepted Answer

As @j-c-doe pointed out you can use the stateful option with batch of one and transfer the weights. The other method that I found was to keep unrolling the LSTM and predicting as below:

for i in range(150):
    sentence.append(int2char[letter[-1]])
    p = model.predict(np.array(letter)[None,:])
    letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])

NOTE: The dimensionality of the prediction is really important! np.array(letter)[None,:] gives a (1,i+1) shape. This way no modification to the model is required.

And most importantly it keeps passing on the cell state memory and hidden state. I'm not entirely sure if stateful=True if it passes the hidden state as well, or if its only the cell state.

Getting state of predictions in LSTMs

Edit 1:

Answers (2)

Related Questions