Reputation: 43
I have deployed a custom Kubeflow pipeline using a mix of AutoML components and a custom Kubeflow Component.
When I deploy the pipeline, it fails and I get the following error:
textPayload: "The replica workerpool0-0 exited with a non-zero status of 1. Termination reason:
Error. To find out more about why your job exited please check the logs:
https://console.cloud.google.com/logs/viewer? project=205438435937&resource=ml_job%2Fjob_id%XXXXXXXXXXXXXXXX&advancedFilter=resource.type%3D
%22ml_job%22%0Aresource.labels.job_id%3D%XXXXXXXXXXXXXXXXXXXX%22"
insertId: "ibt166bgd"
resource: {
type: "ml_job"
labels: {
job_id: "XXXXXXXXXXXXXXXXXX"
task_name: "service"
project_id: "XXXXXXX-XXXXXX"
}
}
timestamp: "2021-06-10T12:18:53.807150835Z"
severity: "ERROR"
labels: {
ml.googleapis.com/endpoint: ""
}
logName: "projects/XXXXXXX-XXXXXX/logs/ml.googleapis.com%XXXXXXXXXXXXXXXXXXXX"
receiveTimestamp: "2021-06-10T12:18:55.087983509Z"
}
This is my pipeline configuration:
# Kubeflow pipline defined by a Python function
@kfp.dsl.pipeline(
name="sales-prediction-iowa",
pipeline_root=pipeline_root_path)
def pipeline(project_id: str):
pre_process = preprocess(
project_id=project_id,
)
create_dataset = gcc_aip.TabularDatasetCreateOp(
project=project_id,
display_name=display_name,
# gcs_source="gs://vertex-ai-pipeline-bucket/iowa-2020_pre-processed.csv"
gcs_source=pre_process.output
)
training_job_run_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
project=project_id,
display_name="training-iowa-sales",
optimization_prediction_type="regression",
dataset=create_dataset.outputs["dataset"],
model_display_name="iowa-sales-model",
target_column="sale_dollars",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
budget_milli_node_hours=8000,
)
endpoint_op = gcc_aip.ModelDeployOp(
project=project_id, model=training_job_run_op.outputs.model
)
compiler.Compiler().compile(pipeline_func=pipeline,
package_path='iowa-pipeline-job.json')
api_client = AIPlatformClient(project_id=project_id, region=region)
response = api_client.create_run_from_job_spec(
'iowa-pipeline-job.json',
pipeline_root=pipeline_root_path,
service_account=service_account,
parameter_values={
'project_id': project_id,
# 'region': region,
# 'pipeline_root_path': pipeline_root_path,
# 'service_account': service_account,
# 'display_name': display_name
}
)
I have a sneaky suspicion it might be linked to regions, but please let me know if there is something else her.
Thanks in advance!
Upvotes: 1
Views: 1089