sachinruk
sachinruk

Reputation: 9869

Getting state of predictions in LSTMs

I am attempting to generate shakespeare text using the following model:

model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

The training set consists of characters converted to numbers. Where x is of shape (num_sentences, sentence_len) and same shape for y, where y is simply x offset by one character. In this case sentence_len=40.

However, when I predict I predict one character at a time. See below for how I fit and predict using the model:

for i in range(2):
    model.fit(x,y, batch_size=128, epochs=1)

    sentence = []
    letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
    for i in range(100):
        sentence.append(val2chr(letter))
        # Predict ONE letter at a time
        p = model.predict(letter)
        letter = np.random.choice(27,1,p=p[0][0])
    print(''.join(sentence))

However, regardless of how many epochs I train all I get is jibberish for the output. One of the possible reasons is that I do not get the cell memory from the previous prediction.

So the question is how do I make sure that the state is sent off to the next cell before I predict?

Full jupyter notebook example is here:

Edit 1:

I just realised that I would need to send in the previous LSTMs hidden state and not just cell memory. I have since tried to redo the model as:

batch_size = 64

model = Sequential()
model.add(Embedding(len_vocab, 64, batch_size=batch_size))
model.add(LSTM(256, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

However, now I cannot predict one letter at a time as it is expecting a batch_size of inputs.

Upvotes: 1

Views: 119

Answers (2)

sachinruk
sachinruk

Reputation: 9869

As @j-c-doe pointed out you can use the stateful option with batch of one and transfer the weights. The other method that I found was to keep unrolling the LSTM and predicting as below:

for i in range(150):
    sentence.append(int2char[letter[-1]])
    p = model.predict(np.array(letter)[None,:])
    letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])

NOTE: The dimensionality of the prediction is really important! np.array(letter)[None,:] gives a (1,i+1) shape. This way no modification to the model is required.

And most importantly it keeps passing on the cell state memory and hidden state. I'm not entirely sure if stateful=True if it passes the hidden state as well, or if its only the cell state.

Upvotes: 0

J.C Doe
J.C Doe

Reputation: 11

The standard way to train a char-rnn with Keras can be found in the official example: lstm_text_generation.py.

model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

This model is trained based on sequences of maxlen characters. While training this network, LSTM states are reset after each sequence (stateful=False by default).

Once such a network is trained, you may want to feed and predict one character at a time. The simplest way to do that (that I know of), is to build another Keras model with the same structure, initialize it with the weights of the first one, but with RNN layers in Keras "stateful" mode:

model = Sequential()
model.add(LSTM(128, stateful=True, batch_input_shape=(1, 1, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

In this mode, Keras has to know the complete shape of a batch (see the doc here). Since you want to feed the network only one sample of one step of characters, the shape of a batch is (1, 1, len(chars)).

Upvotes: 1

Related Questions