Reputation: 1092
I created a Vertex AI pipeline to perform a simple ML flow of creating a dataset, training a model on it and then predicting on the test set. There is a python function based component (train-logistic-model) where I train the model. However, in the component I specify an invalid package and hence the step in the pipeline fails. I know this because when I corrected the package name the step worked fine. However, for the failed pipeline I am unable to see any logs. When I click on the "VIEW JOB" under "Execution Info" on the pipeline Runtime Graph (pic attached) it takes me to the "CUSTOM JOB" page which the pipeline ran. There is a message:
Custom job failed with error message: The replica workerpool0-0 exited with a non-zero status of 1 ...
When I click the VIEW LOGS button, it takes me to the Logs Explorer where there are NO logs. Why are there no logs? Do I need to enable logging somewhere in the pipeline for this? Or could it be a permission issue (it does not mention anything about it though, just this message on the Logs Explorer and 0 logs below it.
Showing logs for time specified in query. To view more results update your query
Upvotes: 2
Views: 1608
Reputation: 2793
I ran into this as well. Apparently logging doesn't work on Vertex for steps with a small machine with a GPU. You need to increase the size of your machine for this to work.
From the docs:
Additionally, using smaller machines types like n1-highmem-2 with GPUs might cause logging to fail for some workloads because of CPU constraints. If your training job stops returning logs, consider selecting a larger machine type
Upvotes: 1
Reputation: 1
from google.cloud import aiplatform
from collections import namedtuple
import json
import time
def get_status_helper(client):
response = client.get_hyperparameter_tuning_job(
name=training_job.metadata["resource_name"])
job_status = str(response.state)
return job_status
api_endpoint = f"{location}-aiplatform.googleapis.com"
client_options = {"api_endpoint": api_endpoint}
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
client.get_custom_job(name="projects/{project-id}/locations/{your-location}/customJobs/{pipeline-id}")
========================================
projects/123456789101/locations/us-central1/customJobs/23456789101234567892
Above name can be found in the component logs
Upvotes: 0