Unable to use existing code working with base transformers on 'large' models

Question

My Python code works OK for base transformer models, but when I attempt to use 'large' models, or roberta models I receive error mesages. The most common message I print below.

Epoch 1 / 40

RuntimeError Traceback (most recent call last) in () 12 13 #train model ---> 14 train_loss, _ = fine_tune() 15 # WE DON'T CARE ABOUT THE SECOND ITEM THE MODEL OUTPUTS (total_preds) 16 # We onlt want the average loss values here 'avg_loss'

5 frames /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias) 1688 if input.dim() == 2 and bias is not None: 1689 # fused op is marginally faster -> 1690 ret = torch.addmm(bias, input, weight.t()) 1691 else: 1692 output = input.matmul(weight.t())

RuntimeError: mat1 dim 1 must match mat2 dim 0

I am  guessing there is some kind of a mismatch between matrices(Tensors) such that an operation cannot occur. If I can better understand the issue, I can better address the necessary changes to my code. Her is the fine tuning function I am using...

def fine_tune():

model.train()

total_loss, total_accuracy = 0, 0

empty list to save model predictions

total_preds=[]

iterate over batches

for step,batch in enumerate(train_dataloader):

# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
  print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))

# push the batch to gpu
batch = [r.to(device) for r in batch]

sent_id, mask, labels = batch

# clear previously calculated gradients 
model.zero_grad()        

# get model predictions for the current batch
preds = model(sent_id, mask)

# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)

# add on to the total loss
total_loss = total_loss + loss.item()

# backward pass to calculate the gradients
loss.backward()

# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

# update parameters
optimizer.step()

# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# Length of preds is the same as the batch size

# append the model predictions
total_preds.append(preds)

compute the training loss of the epoch

avg_loss = total_loss / len(train_dataloader)

reshape the predictions in form of (number of samples, no. of classes)

total_preds = np.concatenate(total_preds, axis=0)

return avg_loss, total_preds

regards, Mark

Unable to use existing code working with base transformers on 'large' models

empty list to save model predictions

iterate over batches

compute the training loss of the epoch

reshape the predictions in form of (number of samples, no. of classes)

Answers (1)

Related Questions

Unable to use existing code working with base transformers on &#39;large&#39; models

empty list to save model predictions

iterate over batches

compute the training loss of the epoch

reshape the predictions in form of (number of samples, no. of classes)

Answers (1)

Related Questions

Unable to use existing code working with base transformers on 'large' models