Reputation: 12679
I am trying to build a Question Answering Pipeline with the Hugginface framework but facing the KeyError: 'eval_loss'
error. My goal is to train and save the best model at last and evaluate the validation test on the loaded model. My trainer configuration looks like this:
args = TrainingArguments(f'model_training',
evaluation_strategy="epoch",
label_names = ["start_positions", "end_positions"],
logging_steps = 1,
learning_rate=2e-5,
num_train_epochs=epochs,
save_total_limit = 2,
load_best_model_at_end=True,
save_strategy="epoch",
logging_strategy="epoch",
report_to="none",
weight_decay=0.01,
fp16=True,
push_to_hub=False)
While training, getting this error:
Traceback (most recent call last):
File "qa_pipe.py", line 286, in <module>
pipe.training(train_d, val_d, epochs = 2)
File "qa_pipe.py", line 263, in training
self.trainer.train()
File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 1505, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 1838, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 2090, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 2193, in _save_checkpoint
metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'
The minimal working example is provided on colab
How to avoid this error and save the best model at last?
Upvotes: 3
Views: 4096
Reputation: 740
When the Trainer is determining whether the model of the current iteration is better than previous checkpoint(s), it will use the key "eval_loss", as specified by TrainingArguments parameters metric_for_best_model
and metric_key_prefix
(see https://huggingface.co/docs/transformers/main/main_classes/trainer#trainingarguments).
I encountered this problem because I was using multiple evaluation datasets, none of which was called "eval", and a custom metric (passed via the compute_metrics
argument of the Trainer object), not called "loss". So I fixed this problem by passing:
my_metric_name = "wer" #for example, for ASR;
#make sure you are returning this in your "compute_metrics" function
my_dataset_name = "dev" #for example; make sure one of the datasets in the DatasetDict
#passed to the eval_dataset argument of Trainer is called "dev"
training_args = TrainingArguments(
output_dir="./output_dir",
...,
metric_for_best_model="eval_" + my_dataset_name + "_" + my_metric_name
)
Upvotes: 0
Reputation: 11
See the prediction_step function of the Trainer class:
On a high level, it checks if either your input to the model (the thing the data collator returns) contains "labels" which should be the targets to your prediction. Alternatively it checks if your input contains a key "return_loss".
If you have labels or "return_loss" = True, the function will compute the desired loss and return it properly, otherwise it will return None for the loss.
I see in your code that you are using the library only high level so it might not be so helpful for you but I suppose the easiest fix is creating a custom data collator that adds the entry "return_loss" = True to the input dict.
Upvotes: 1