Reputation: 61
I'm running:
#original training script
trainer = transformers.Trainer(
model=model,
train_dataset=train_dataset,
eval_dataset=test_dataset, #turn on the eval dataset for comparisons
args=transformers.TrainingArguments(
num_train_epochs=2,
per_device_train_batch_size=1,
gradient_accumulation_steps=1,
warmup_ratio=0.05,
max_steps=20,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir="outputs",
optim="paged_adamw_8bit",
lr_scheduler_type='cosine',
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
I'm not 100% clear, but I think the loss shown is versus the training dataset versus the eval dataset...
How do I show losses versus eval (and training set too, ideally)?
I would have expected adding eval_dataset was enough...
Upvotes: 0
Views: 480
Reputation: 1
You can add compute_metrics function to your Trainer module
def compute_metrics(p):
print(type(p))
pred, labels = p
pred = np.argmax(pred, axis=1)
accuracy = accuracy_score(y_true=labels, y_pred=pred)
recall = recall_score(y_true=labels, y_pred=pred)
precision = precision_score(y_true=labels, y_pred=pred)
f1 = f1_score(y_true=labels, y_pred=pred)
return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)
Upvotes: 1