Racim Righi
Racim Righi

Reputation: 31

How to run multiple custom jobs at the same time in Vertex AI?

We run custom training jobs in Vertex AI. They are scheduled to run once a week using Airflow. These jobs are provisioned at the same time to Vertex AI but are running sequentially (one at a time). Each job takes around 10 minutes to run while the other 20+ jobs are pending.

We provision the custom jobs at the same time, we were at least expecting them to run by batches (5 at a time for example). But they're getting started sequentially. This is the Vertex AI config that we are using:

{
                    "displayName": display_name,
                    "trainingTaskDefinition": PREDICTION_JOB_SCHEMA_URI,
                    "trainingTaskInputs": {
                        "serviceAccount": VERTEX_SERVICE_ACCOUNT,
                        "workerPoolSpecs": [
                            {
                                "machineSpec": {
                                    "machineType": "n2-standard-16",
                                },
                                "replicaCount": 1,
                                "pythonPackageSpec": {
                                    "executorImageUri": PREDICTION_EXECUTOR_IMAGE_URI,
                                    "packageUris": task_params["package_uris"],
                                    "pythonModule": task_params["python_module"],
                                    "args": task_params["args"],
                                    "env": task_params["envs"],
                                },
                            }
                        ],
                    },
                }

Upvotes: 1

Views: 334

Answers (0)

Related Questions