How to configure YOLOv8 yaml file to access blob storage dataset on Azure?

Question

Context

I want to train a custom model using Yolo (v8). I've got it working on my local machine, but it is very slow, and want to run the job on Azure Machine Learning Studio for efficiency. I am using Azure ML SDK v2.

Issue

When I run on Azure ML, I get an error saying that YOLO cannot locate my training images.

Traceback (most recent call last):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/ultralytics/yolo/engine/trainer.py", line 125, in __init__
  self.data = check_det_dataset(self.args.data)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/ultralytics/yolo/data/utils.py", line 243, in check_det_dataset
  raise FileNotFoundError(msg)
FileNotFoundError: 
Dataset 'custom.yaml' not found ⚠️, missing paths ['/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/valid/images']

Code / analysis

Here is the code I use to run the job:

command_job = command(
    display_name='Test Run 1',
    code="./src/",
    command="yolo detect train data=custom.yaml model=yolov8n.pt epochs=1 imgsz=1280 seed=42",
    environment="my-custom-env:3",
    compute=compute_target
)

On my local machine (using visual studio code), the custom.yaml file is in the ./src/ directory. When I run the job above, the custom.yaml is uploaded and appears in the Code section of the job (viewed in Azure ML Studio). From investigating I think this is the compute working directory which has the path:

'/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/'

My custom.yaml looks like this:

path: ../
train: train/images
val: valid/images

nc: 1
names: ["bike"]

So what is happening is that YOLO is looking at my custom.yaml, using the root directory as the path, and the trying to find valid/images within that directory:

'/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/valid/images'

My images are in my Datastore, not that directory, hence the error.

What I have tried - updating custom.yaml path

All my data (train and valid) is contained on AzureBlobStorage. In Azure ML Studio I have created a Datastore and added my data as a Dataset (references my AzureBlobStorage account). My file structure is:

Dataset/
   - Train/
        - Images
        - Labels
   - Valid/
        - Images
        - Labels

Within my custom.yaml file I have tried replacing path with the following:

 **Storage URI**: https://mystorageaccount.blob.core.windows.net/my-datasets
 **Datastore URI**: azureml://subscriptions/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourcegroups/my-rg/workspaces/my_workspage/datastores/my_datastore/paths/Dataset/

If I do this I get the same error. This time it appends the path to the end of the working directory. Example:

    '/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/https://mystorageaccount.blob.core.windows.net/my-datasets/valid/images'

What I have tried - mounting / download dataset

I've read the Microsoft docs - (e.g. here and here) - and it says things like:

For most scenarios, you'll use URIs (uri_folder and uri_file) - a location in storage that can be easily mapped to the filesystem of a compute node in a job by either mounting or downloading the storage to the node.

It feels like I should be mapping my data (in my Datastore) to the compute filesystem. Then I could can use that path in my custom.yaml. The documents are not clear on how I do that.

In brief: how do I set up my data on Azure ML so that the path in my custom.yaml points to the data?

How to configure YOLOv8 yaml file to access blob storage dataset on Azure?

Answers (1)

Related Questions