Reputation: 124
I'm training a 2-layer character LSTM with keras to generate sequences of characters similar to the corpus I am training on. When I train the LSTM, however, the generated output by the trained LSTM is the same sequence over and over again.
I've seen suggestions for similar problems to increase the LSTM input sequence length, increase the batch size, add dropout layers, and increase the dropout amount. I've tried all these things and none of them seem to have fixed the issue. The one thing that has yielded some success is adding a random noise vector to each vector outputted by the LSTM during generation. This makes sense since the LSTM uses the previous step's output to generate the next output. However, generally if I add enough noise to break the LSTM out of its repetitive generation, the quality of the output degrades a great deal.
My LSTM training code is as follows:
# [load data from file]
raw_text = collected_statements.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text + '\b')))
char_to_int = dict((c, i) for i, c in enumerate(chars))
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
seq_in = raw_text[i:i + seq_length]
seq_out = raw_text[i + seq_length]
dataX.append([char_to_int[char] for char in seq_in])
dataY.append(char_to_int[seq_out])
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]),
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1,
save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fix random seed for reproducibility
seed = 8
numpy.random.seed(seed)
# split into 80% for train and 20% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=seed)
# train the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=18,
batch_size=256, callbacks=callbacks_list)
My generation code is as follows:
filename = "weights-improvement-18-1.5283.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = unpadded_patterns[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = (x / float(n_vocab)) + (numpy.random.rand(1, len(pattern), 1) * 0.01)
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
#print(index)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
sys.stdout.write(result)
pattern.append(index)
pattern = pattern[1:len(pattern)]
print("\nDone.")
When I run the generation code, I get the same sequence over and over again:
we have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country." "we have the best economy in the history of our country." "we
have the best economy in the history of our country." "we have the best
economy in the history of our country." "we have the best economy in the
history of our country." "we have the best economy in the history of our
country."
Is there anything else I could try that could help to generate something besides the same sequence over and over?
Upvotes: 4
Views: 2110
Reputation: 33420
What the model generates as its output is the probability of the next character given the previous character. And in the text generation process you just take the character with maximum probability. Instead, it might help to inject some stochasticity (i.e. randomness) into this process by sampling the next character based on the probability distribution generated by the model. One easy way to do this is to use np.random.choice
function:
# get the probability distribution generated by the model
prediction = model.predict(x, verbose=0)
# sample the next character based on the predicted probabilites
idx = np.random.choice(y.shape[1], 1, p=prediction[0])[0]
# the rest is the same...
This way the next selected character is not always the most probable characters. Rather, all the characters have a chance to be selected guided by the probability distribution generated by your model. This stochasticity not only breaks the repetitive loop, but also it may result in some interesting generated texts.
Additionally, you can further inject stochasticity by introducing softmax temperature in the sampling process, which you can see in the @Primusa's answer which is based on the Keras char-rnn example. Basically, its ideas is that it would re-weight the probability distribution so that you can control how much surprising (i.e. higher temperature/entropy) or predictable (i.e. lower temperature/entropy) the next selected character would be.
Upvotes: 2
Reputation: 13498
In your character generation I would suggest sampling from the probabilities your model outputs instead of taking the argmax
directly. This is what the keras example char-rnn does to get diversity.
This is the code they use for sampling in their example:
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
In your code you've got index = numpy.argmax(prediction)
I'd suggest just replacing that with index = sample(prediction)
and experiment with temperatures of your choice. Keep in mind that higher temperatures make your output more random and lower temperatures make it less random.
Upvotes: 4