Reputation: 53
I am trying to build Bert model for Arabic Text classification task using pretrained model from https://github.com/alisafaya/Arabic-BERT i want to know the exact difference between the two statement:
model_name = 'kuisailab/albert-large-arabic'
model = AutoModel.from_pretrained(model_name)
model = BertForSequenceClassification .from_pretrained(model_name)
I fine-tuned the model by adding the following layers on top of the model:
for param in model.parameters():
param.requires_grad = False
model.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.ReLU(),
nn.Linear(768,512),
nn.Linear(512,2),
nn.LogSoftmax(dim=1),
nn.Softmax(dim=1)
)
model = model.to(device)
and used the optimizer:
optimizer = AdamW(model.parameters(),
lr = 2e-5)
finally this is my training loop:
model.train()
for idx, row in train_data.iterrows():
text_parts = preprocess_text(str(row['sentence']))
label = torch.tensor([row['label']]).long().to(device)
optimizer.zero_grad()
overall_output = torch.zeros((1, 2)).float().to(device)
for part in text_parts:
if len(part) > 0:
try:
input = part.reshape(-1)[:512].reshape(1, -1)
# print(input.shape)
overall_output += model(input, labels=label)[1].float().to(device)
except Exception as e:
print(str(e))
# overall_output /= len(text_parts)
overall_output = F.softmax(overall_output[0], dim=-1)
if label == 0:
label = torch.tensor([1.0, 0.0]).float().to(device)
elif label == 1:
label = torch.tensor([0.0, 1.0]).float().to(device)
# print(overall_output, label)
loss = criterion(overall_output, label)
total_loss += loss.item()
loss.backward()
optimizer.step()
and i get the error:
mat1 dim 1 must match mat2 dim 0
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-33-5c2f0fea6c1f> in <module>()
39 total_loss += loss.item()
40
---> 41 loss.backward()
42 optimizer.step()
43
1 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
130 Variable._execution_engine.run_backward(
131 tensors, grad_tensors_, retain_graph, create_graph,
--> 132 allow_unreachable=True) # allow_unreachable flag
133
134
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
any idea how to solve this error
Upvotes: 1
Views: 1010
Reputation: 7369
BertForSequenceClassification
is the class which extends the BertModel
, i.e, BertForSequenceClassification
defines a logistic regression layer, for the task of classificiation, with cross-entropy loss, to be jointly fine-tuned or trained on the existing Bert Model.
AutoModel
, is a class provided in the library that allows to automatically identify the model class based on it's name or the model file contents.
Since you already know that you need a model for classification, you can directly use BertForSequenceClassification
Upvotes: 1