Reputation: 845
I'm working on a system to gather some training images and upload them automatically into a Vertex AI image object detection dataset. I've saved the files into a google cloud storage bucket along with a jsonl file with labels and bounding boxes. I've noticed if I import the JSON file through the vertex console on the web then everything is fine and the images are added to the dataset along with their bounding boxes.
When I add them with the API I first thought there were no bounding boxes, but on closer inspection I can see that when I use the vertex console the bounding boxes are there with all the existing ones in the annotation set called 'my_dataset_iod', but when I add through the API I am inadvertently creating a new Annotation set called something like 'my_dataset_image_bounding_box_2023_09_19_080419'. I can also see that there is an ID for 'my_dataset_iod' in the url line - annotationSetId=1197..... - but I can't figure out how to tell the import_data command that this is the annotation set I want to use.
Here's my code for reference
from google.cloud import storage
from google.cloud import aiplatform
.....
current_datetime = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
jsonl_file_name = f"data_{current_datetime}.jsonl"
jsonl_blob = bucket.blob(jsonl_file_name)
jsonl_blob.upload_from_string(jsonl_string, content_type='application/jsonl')
dataset = aiplatform.ImageDataset(f"projects/{project_id}/locations/us-central1/datasets/{dataset_id}")
dataset.import_data( gcs_source=[f"gs://{bucket_name}/{jsonl_file_name}"], import_schema_uri='gs://google-cloud-aiplatform/schema/dataset/ioformat/image_bounding_box_io_format_1.0.0.yaml')
Upvotes: 1
Views: 217