How to vectorize loss for a LSTM doing sequential Language modelling

Question

So I have an assignment involving Language Modelling and I passed all the unit tests but my code is too slow to run. I think it's because of the way I compute my loss. The formula we're given is the following:

My naive implementation is the following:

losses_batch_list = []
batch_size = log_probas.size(0)
for b in range(batch_size):

    seq_length = max([i for i, e in enumerate(mask[b,:]) if e != 0]) + 1

    loss_batch = 0
    for t in range(seq_length):
        for n in range(self.vocabulary_size):
            if targets[b, t] == n:
                loss_batch += log_probas[b, t, n].detach()

    loss_batch = - loss_batch / seq_length
    losses_batch_list.append(loss_batch)

loss = torch.tensor(np.mean(losses_batch_list))

return loss

But that loop runs for ever since the vocabulary size is the same approximately as GPT1 (~40 000) and the sequence length is up to 255 (something it is shorter because of padding, hence the mask). Does anyone have any tips on how to vectorize/speed this up? I know it's correct but I can't report any results with it... Thanks!

How to vectorize loss for a LSTM doing sequential Language modelling

Answers (1)

Related Questions