Reputation: 9869
I am attempting to generate shakespeare text using the following model:
model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(256, return_sequences=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
The training set consists of characters converted to numbers. Where x
is of shape (num_sentences, sentence_len)
and same shape for y
, where y
is simply x
offset by one character. In this case sentence_len=40
.
However, when I predict I predict one character at a time. See below for how I fit and predict using the model:
for i in range(2):
model.fit(x,y, batch_size=128, epochs=1)
sentence = []
letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
for i in range(100):
sentence.append(val2chr(letter))
# Predict ONE letter at a time
p = model.predict(letter)
letter = np.random.choice(27,1,p=p[0][0])
print(''.join(sentence))
However, regardless of how many epochs I train all I get is jibberish for the output. One of the possible reasons is that I do not get the cell memory from the previous prediction.
So the question is how do I make sure that the state is sent off to the next cell before I predict?
Full jupyter notebook example is here:
I just realised that I would need to send in the previous LSTMs hidden state and not just cell memory. I have since tried to redo the model as:
batch_size = 64
model = Sequential()
model.add(Embedding(len_vocab, 64, batch_size=batch_size))
model.add(LSTM(256, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()
However, now I cannot predict one letter at a time as it is expecting a batch_size
of inputs.
Upvotes: 1
Views: 119
Reputation: 9869
As @j-c-doe pointed out you can use the stateful option with batch of one and transfer the weights. The other method that I found was to keep unrolling the LSTM and predicting as below:
for i in range(150):
sentence.append(int2char[letter[-1]])
p = model.predict(np.array(letter)[None,:])
letter.append(np.random.choice(len(char2int),1,p=p[0][-1])[0])
NOTE: The dimensionality of the prediction is really important! np.array(letter)[None,:]
gives a (1,i+1)
shape. This way no modification to the model is required.
And most importantly it keeps passing on the cell state memory and hidden state. I'm not entirely sure if stateful=True
if it passes the hidden state as well, or if its only the cell state.
Upvotes: 0
Reputation: 11
The standard way to train a char-rnn with Keras can be found in the official example: lstm_text_generation.py.
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
This model is trained based on sequences of maxlen characters. While training this network, LSTM states are reset after each sequence (stateful=False by default).
Once such a network is trained, you may want to feed and predict one character at a time. The simplest way to do that (that I know of), is to build another Keras model with the same structure, initialize it with the weights of the first one, but with RNN layers in Keras "stateful" mode:
model = Sequential()
model.add(LSTM(128, stateful=True, batch_input_shape=(1, 1, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
In this mode, Keras has to know the complete shape of a batch (see the doc here). Since you want to feed the network only one sample of one step of characters, the shape of a batch is (1, 1, len(chars)).
Upvotes: 1