Ronan McGovern
Ronan McGovern

Reputation: 61

With a HuggingFace trainer, how do I show the training loss versus the eval data set?

I'm running:

#original training script

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=test_dataset, #turn on the eval dataset for comparisons
    args=transformers.TrainingArguments(
        num_train_epochs=2,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=1,
        warmup_ratio=0.05,
        max_steps=20,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        lr_scheduler_type='cosine',
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

I'm not 100% clear, but I think the loss shown is versus the training dataset versus the eval dataset... training steps and losses

How do I show losses versus eval (and training set too, ideally)?

I would have expected adding eval_dataset was enough...

Upvotes: 0

Views: 480

Answers (1)

PromptCloud
PromptCloud

Reputation: 1

You can add compute_metrics function to your Trainer module

def compute_metrics(p):
    print(type(p))
    pred, labels = p
    pred = np.argmax(pred, axis=1)
    accuracy = accuracy_score(y_true=labels, y_pred=pred)
    recall = recall_score(y_true=labels, y_pred=pred)
    precision = precision_score(y_true=labels, y_pred=pred)
    f1 = f1_score(y_true=labels, y_pred=pred)
    return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

Upvotes: 1

Related Questions