Jiho Choi
Jiho Choi

Reputation: 1311

Vertex AI custom job to run python-module with pre-built containers (using gcloud CLI)

I am updating a model that is previously running on gcp ai-platform to vertex ai [1, 2].

The settings that I am looking for are as below.

Can someone help me if there is something wrong with the below sequence of the task?

It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform.

Python3 module packaging

# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │   ├── __init__.py
# │   └── task.py
# └── setup.py

python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated

gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly

Submit Custom Job

gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task

However, I am encountering the below errors.

Error Messages

file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

c.f. I am noticing file:/// with 3 slashes. And belive there is something to do with docker. [3]

enter image description here

enter image description here

References

Upvotes: 1

Views: 1832

Answers (1)

Jiho Choi
Jiho Choi

Reputation: 1311

I end up fixing the problem. I'll share the situation for those of you with a similar error. The problem was that I wasn't using find_packages() correctly.

First, there are three possible ways of submitting custom vertex-ai jobs.

  1. auto packaging
  2. without auto packaging - Custom container image
  3. without auto packaging - Python App
    1. using local-package-path param
    2. using --python-package-uris flag

(I believe) Method 1, 2, and 3.1 build docker images in the local machine and submit the built image to vertex-ai. Method 3.2 simply uses a pre-built container and combines python packages at executor-image-uri in vertex-ai.

** The problem was that when I run the below command to generate the dist package, I ran it from ../.. with ./[PATH]/. and ended up not correctly getting the find_packages() values which lead to both 3.1 and 3.2 methods not correctly running.

# Error: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup

setup(
    name='trainer',
    version='0.1',
    packages=find_packages(),  # <-- HERE
    include_package_data=True,
)

The fixed version of local-package and external uris end up making the below script work.

3.1 without auto packaging - Python App - using local-package-path param
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',script=task.py,local-package-path=vertex-ai-poc/trainer
3.2 Without auto packaging - Python App - using --python-package-uris flag
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz' \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
Results

enter image description here

Upvotes: 1

Related Questions