racerX
racerX

Reputation: 1092

Unable to view Vertex AI pipeline node logs

I created a Vertex AI pipeline to perform a simple ML flow of creating a dataset, training a model on it and then predicting on the test set. There is a python function based component (train-logistic-model) where I train the model. However, in the component I specify an invalid package and hence the step in the pipeline fails. I know this because when I corrected the package name the step worked fine. However, for the failed pipeline I am unable to see any logs. When I click on the "VIEW JOB" under "Execution Info" on the pipeline Runtime Graph (pic attached) it takes me to the "CUSTOM JOB" page which the pipeline ran. There is a message:

Custom job failed with error message: The replica workerpool0-0 exited with a non-zero status of 1 ...

When I click the VIEW LOGS button, it takes me to the Logs Explorer where there are NO logs. Why are there no logs? Do I need to enable logging somewhere in the pipeline for this? Or could it be a permission issue (it does not mention anything about it though, just this message on the Logs Explorer and 0 logs below it.

Showing logs for time specified in query. To view more results update your query

enter image description here

Upvotes: 2

Views: 1608

Answers (2)

Robbe
Robbe

Reputation: 2793

I ran into this as well. Apparently logging doesn't work on Vertex for steps with a small machine with a GPU. You need to increase the size of your machine for this to work.

From the docs:

Additionally, using smaller machines types like n1-highmem-2 with GPUs might cause logging to fail for some workloads because of CPU constraints. If your training job stops returning logs, consider selecting a larger machine type

Upvotes: 1

Avinash Gunda
Avinash Gunda

Reputation: 1

Find the pipeline job id in the component logs and paste it in the below code

from google.cloud import aiplatform

from collections import namedtuple

import json

import time

def get_status_helper(client):

response = client.get_hyperparameter_tuning_job(
        name=training_job.metadata["resource_name"])

job_status = str(response.state)

return job_status

api_endpoint = f"{location}-aiplatform.googleapis.com"

client_options = {"api_endpoint": api_endpoint}

client = aiplatform.gapic.JobServiceClient(client_options=client_options)

client.get_custom_job(name="projects/{project-id}/locations/{your-location}/customJobs/{pipeline-id}")

Sample name or pipeline job id for reference:

========================================

projects/123456789101/locations/us-central1/customJobs/23456789101234567892

Above name can be found in the component logs

Upvotes: 0

Related Questions