Reputation: 51
I got a custom_model_training_cpus error when runing a submitted pipeline on Vertex AI. I could not find any documents. And I am using the n1-standard-4 for the deployment machine, I do not see any issue. Any commnents would be much appriciated.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_cpus, cause=null; Failed to create custom job for the task.
Upvotes: 5
Views: 5983
Reputation: 31
Since this is training quota, as @ironpan noted, one option would be to wait until the current pipelines in your GCP project are complete and then try again. You could also try training again but using n1-standard-2, which has two fewer CPUs, or updating the pipeline to run fewer containers at the same time.
Alternatively, as @harshit-saini answered, you can edit quotas at that link. To find the quota that is exhausted, you can enter the quota name aiplatform.googleapis.com/custom_model_training_cpus
from the error message into the table filter. In the table, you can then select the quota(s) you want to edit and then click the "Edit quotas" button to request a higher quota.
Also, as @nestor-ceniza-jr noted, quotas are per region, so you can always switch to a different region as well, such as a region that has a higher quota limit. However, the resulting model will be in whatever region you train in, in case that's an issue for you.
Upvotes: 0
Reputation: 31
Vertex Pipelines uses Vertex training service for executing its containers. The exhausted resource is the Vertex training quota. You can find more details about this quota in https://cloud.google.com/vertex-ai/docs/quotas#training.
Upvotes: 0
Reputation: 369
Quotas can be edited through this link: https://console.cloud.google.com/apis/api/aiplatform.googleapis.com/quotas
If you are not able edit the Quotas, you may need to contact GCPs customer service.
Upvotes: 0