Reputation: 51
I was following a tutorial to generate English text using LSTMs and using Shakespeare's works as a training file. This is the model I am using with reference to that-
model = Sequential()
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
model.add(Dropout(0.2))
for i in range(LAYER_NUM - 1):
model.add(LSTM(HIDDEN_DIM, return_sequences=True))
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
After 30 epochs of training, I save the model using model.save('model.h5')
. At this point, the model has learned the basic format and has learned a few words. However, when I try to load the model in a new program using load_model('model.h5')
and try to generate some text, it ends up predicting completely random letters and symbols. This led me to think that the model weights are not being restored properly, since I encountered the same problem while storing only the model weights. So is there any alternative for storing and restoring trained models with LSTM layers?
For reference, in order to generate the text, the function randomly generates a character and feeds it into the model to predict the next character. This is the function-
def generate_text(model, length):
ix = [np.random.randint(VOCAB_SIZE)]
y_char = [ix_to_char[ix[-1]]]
X = np.zeros((1, length, VOCAB_SIZE))
for i in range(length):
X[0, i, :][ix[-1]] = 1
print(ix_to_char[ix[-1]], end="")
ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
y_char.append(ix_to_char[ix[-1]])
return ('').join(y_char)
EDIT
The snippet of code for training-
for nbepoch in range(1, 11):
print('Epoch ', nbepoch)
model.fit(X, y, batch_size=64, verbose=1, epochs=1)
if nbepoch % 10 == 0:
model.model.save('checkpoint_{}_epoch_{}.h5'.format(512, nbepoch))
generate_text(model, 50)
print('\n\n\n')
Where generate_text() is just a function to predict a new character, starting from a randomly generated character. After every 10 epochs of training, the entire model is saved as a .h5 file.
The code for loading the model-
print('Loading Model')
model = load_model('checkpoint_512_epoch_10.h5')
print('Model loaded')
generate_text(model, 400)
As far as predictions go, the text generation is normally structured while training and the model learns some words. However, when the saved model is loaded, the text generation is completely random, as if the weights were randomly reinitialized.
Upvotes: 2
Views: 1071
Reputation: 51
After doing a bit of digging, I finally found out that the way I was creating the dictionary mapping between characters and one-hot vectors was the issue. I was using the char = list(set(data))
function to get a list of all the characters in the file, and then assign the index of the character as that character's 'code number'. However, apparently the list(set(data))
function does not always output the same list, instead the order is randomized for each 'session' of python. So my dictionary mapping used to change between saving and loading the model, since that occurred in different scripts. Using char = sorted(list(set(data)))
works to eliminate this problem.
Upvotes: 2