Reputation: 166
UPDATE: It was a mistake in the logic generating new characters. See answer below.
ORIGINAL QUESTION: I built an LSTM for character-level text generation with Pytorch. The model trains well (loss decreases reasonably etc.) but the trained model ends up outputting the last handful of words of the input repeated over and over again (e.g. Input: "She told her to come back later, but she never did"; Output: ", but she never did, but she never did, but she never did" and so on).
I have played around with the hyperparameters a bit, and the problem persists. I'm currently using:
Loss function: BCE
Optimizer: Adam
Learning rate: 0.001
Sequence length: 64
Batch size: 32
Embedding dim: 128
Hidden dim: 512
LSTM layers: 2
I also tried not always choosing the top choice, but this only introduces incorrect words and doesn't break the loop. I've been looking at countless tutorials, and I can't quite figure out what I'm doing differently/wrong.
The following is the code for training the model. training_data
is one long string and I'm looping over it predicting the next character for each substring of length SEQ_LEN
. I'm not sure if my mistake is here or elsewhere but any comment or direction is highly appreciated!
loss_dict = dict()
for e in range(EPOCHS):
print("------ EPOCH {} OF {} ------".format(e+1, EPOCHS))
lstm.reset_cell()
for i in range(0, DATA_LEN, BATCH_SIZE):
if i % 50000 == 0:
print(i/float(DATA_LEN))
optimizer.zero_grad()
input_vector = torch.tensor([[
vocab.get(char, len(vocab))
for char in training_data[i+b:i+b+SEQ_LEN]
] for b in range(BATCH_SIZE)])
if USE_CUDA and torch.cuda.is_available():
input_vector = input_vector.cuda()
output_vector = lstm(input_vector)
target_vector = torch.zeros(output_vector.shape)
if USE_CUDA and torch.cuda.is_available():
target_vector = target_vector.cuda()
for b in range(BATCH_SIZE):
target_vector[b][vocab.get(training_data[i+b+SEQ_LEN])] = 1
error = loss(output_vector, target_vector)
error.backward()
optimizer.step()
loss_dict[(e, int(i/BATCH_SIZE))] = error.detach().item()
Upvotes: 2
Views: 1268
Reputation: 166
ANSWER: I had made a stupid mistake when producing the characters with the trained model: I got confused with the batch size and assumed that at each step the network would predict an entire batch of new characters when in fact it only predicts a single one… That's why it simply repeated the end of the input. Yikes!
Anyways, if you run into this problem DOUBLE CHECK that you have the right logic for producing new output with the trained model (especially if you're using batches). If it's not that and the problem persists, you can try fine-tuning the following:
sequence length
greediness (e.g. probabilistic choice vs. top choice for next character)
batch size
epochs
Upvotes: 4