Reputation: 200
I am trying to fine-tune a BERT model on Azure ML. I am using Azure ML Jobs to log the metrics, hyperparameters and models using the Python MLFlow API.
But the train
method from transformers.Trainer
class is causing the following exception with status code 400:
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE:
Response: {'Error': {'Code': 'ValidationError', 'Severity': None,
'Message': 'No more than 500 characters per params Value. Request contains 2 of greater length.'
...
}
This code worked without issues when it was run on another compute which logged to a self-hosted MLFlow Tracking Server which leads me to believe that this is an issue related to experiment tracking on AML Jobs.
Can someone please help me fix this?
These are my package versions: azure-core==1.29
, azureml-core==1.55
, azureml-mlflow==1.55
, mlflow==2.11
, transformers==4.33.0
I am on Python 3.10
Upvotes: 4
Views: 833
Reputation: 652
For MLFlow v2.8.0 and higher, the maximum length of a parameter value is 6000 (#9709). It appears that, at the time of writing, the Azure ML backend store for MLFlow still considers 500 the maximum length.
The MLFlow integration in transformers
contains some logic to discard parameters with values that exceed the maximum. This looks at mlflow.utils.validation.MAX_PARAM_VAL_LENGTH
(6000), which is too high when using Azure ML as backend store. In my case, the id2label
and label2id
parameters logged by the Trainer
instance were too long, because my model had a lot of labels.
I found possible two workarounds:
Trainer
by removing the MFLow callback handler (before .train()
):trainer = ...
from transformers.integrations import MLflowCallback
trainer.remove_callback(MLflowCallback)
trainer.train()
MAX_PARAM_VAL_LENGTH
to 500 (before instantiating trainer
):import mlflow
mlflow.utils.validation.MAX_PARAM_VAL_LENGTH = 500
trainer = ...
Upvotes: 2