Reputation: 21
I have built a custom container which use my managed dataset on vertex to run my training code, it worked successfully when I create the training job on the Vertex AI website interface.
But now I'm trying to create the training job from a python script using
class google.cloud.aiplatform.CustomContainerTrainingJob
I load a managed dataset that I have on vertex AI with
dataset = aiplatform.ImageDataset(dataset_id) if dataset_id else None
But when I try to run the following code:
model = job.run(
dataset=dataset,
model_display_name=model_display_name,
args=args,
replica_count=replica_count,
machine_type=machine_type,
accelerator_type=accelerator_type,
accelerator_count=accelerator_count,
training_fraction_split=training_fraction_split,
validation_fraction_split=validation_fraction_split,
test_fraction_split=test_fraction_split,
sync=sync,
)
model.wait()
print(model.display_name)
print(model.resource_name)
print(model.uri)
return model
I got the following error:
google.api_core.exceptions.FailedPrecondition: 400 'annotation_schema_uri' should be set in the TrainingPipeline.input_data_config for custom training or hyperparameter tuning with managed dataset.
I feel like something is wrong because when I create the job on the website I specify an export directory for the managed dataset, but I have not found where to do it here.
Any ideas?
Thank you
Upvotes: 0
Views: 1116
Reputation: 21
Well I found the answer in the documentation, data are automatically exported to the provided bucket thus it was not the issue. The issue was in the error ( obviously ). To provide a good annotation URI, it is enough to just add a parameter to run():
annotation_schema_uri=aiplatform.schema.dataset.annotation.image.classification
image.classification was what I needed here but can be replaced by text.extraction if you do text extraction for example.
This will pass as string value the following value which is the asked gs uri:
gs://google-cloud-aiplatform/schema/dataset/annotation/image_classification_1.0.0.yaml
Upvotes: 2