Reputation: 51
My Python code works OK for base transformer models, but when I attempt to use 'large' models, or roberta models I receive error mesages. The most common message I print below.
Epoch 1 / 40
RuntimeError Traceback (most recent call last) in () 12 13 #train model ---> 14 train_loss, _ = fine_tune() 15 # WE DON'T CARE ABOUT THE SECOND ITEM THE MODEL OUTPUTS (total_preds) 16 # We onlt want the average loss values here 'avg_loss'
5 frames /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1688 if input.dim() == 2 and bias is not None: 1689 # fused op is marginally faster -> 1690 ret = torch.addmm(bias, input, weight.t()) 1691 else: 1692 output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0
I am guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...
def fine_tune():
model.train()
total_loss, total_accuracy = 0, 0
total_preds=[]
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size
# append the model predictions
total_preds.append(preds)
avg_loss = total_loss / len(train_dataloader)
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
regards, Mark
Upvotes: 1
Views: 96
Reputation: 51
I wrote a print statement to reveal the size of the input from the pre-trained model. This revealed that true size, namely 1024, rather than the default hard-code value of 768 in the program I have modified. An easy fix once I understood the problem. The moral of the story for me is, when a YouTuber ( a good one actually!) says " all transformers have an output dimension of 768" don't take that necessarily as gospel!
Upvotes: 1