SolipsistElvis
SolipsistElvis

Reputation: 71

How to vectorize loss for a LSTM doing sequential Language modelling

So I have an assignment involving Language Modelling and I passed all the unit tests but my code is too slow to run. I think it's because of the way I compute my loss. The formula we're given is the following: Loss formula

My naive implementation is the following:

losses_batch_list = []
batch_size = log_probas.size(0)
for b in range(batch_size):

    seq_length = max([i for i, e in enumerate(mask[b,:]) if e != 0]) + 1

    loss_batch = 0
    for t in range(seq_length):
        for n in range(self.vocabulary_size):
            if targets[b, t] == n:
                loss_batch += log_probas[b, t, n].detach()

    loss_batch = - loss_batch / seq_length
    losses_batch_list.append(loss_batch)

loss = torch.tensor(np.mean(losses_batch_list))

return loss

But that loop runs for ever since the vocabulary size is the same approximately as GPT1 (~40 000) and the sequence length is up to 255 (something it is shorter because of padding, hence the mask). Does anyone have any tips on how to vectorize/speed this up? I know it's correct but I can't report any results with it... Thanks!

Upvotes: 0

Views: 50

Answers (1)

emily
emily

Reputation: 188

B = batch_size
T = sequence_length (padded)
N = vocab_size

if type(mask_b) == torch.bool:
    mask = mask.view(-1) # (B, T) -> (B*T,)
else:
    mask = mask.bool().view(-1) # (B, T) -> (B*T,)
log_probas = log_probas.view(-1, N) # (B, T, N) -> (B*T, N)
targets = target.view(-1, 1) # (B, T) -> (B*T, 1)
loss = torch.gather(log_probas[mask], -1, target[mask]) # loss without padded tokens
loss = loss.mean()

Upvotes: 1

Related Questions