Reputation: 12487
Context
I want to train a custom model using Yolo (v8). I've got it working on my local machine, but it is very slow, and want to run the job on Azure Machine Learning Studio for efficiency. I am using Azure ML SDK v2
.
Issue
When I run on Azure ML, I get an error saying that YOLO cannot locate my training images.
Traceback (most recent call last):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/ultralytics/yolo/engine/trainer.py", line 125, in __init__
self.data = check_det_dataset(self.args.data)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/ultralytics/yolo/data/utils.py", line 243, in check_det_dataset
raise FileNotFoundError(msg)
FileNotFoundError:
Dataset 'custom.yaml' not found ⚠️, missing paths ['/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/valid/images']
Code / analysis
Here is the code I use to run the job:
command_job = command(
display_name='Test Run 1',
code="./src/",
command="yolo detect train data=custom.yaml model=yolov8n.pt epochs=1 imgsz=1280 seed=42",
environment="my-custom-env:3",
compute=compute_target
)
On my local machine (using visual studio code), the custom.yaml
file is in the ./src/
directory. When I run the job above, the custom.yaml
is uploaded and appears in the Code
section of the job (viewed in Azure ML Studio). From investigating I think this is the compute working directory which has the path:
'/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/'
My custom.yaml looks like this:
path: ../
train: train/images
val: valid/images
nc: 1
names: ["bike"]
So what is happening is that YOLO is looking at my custom.yaml
, using the root directory as the path, and the trying to find valid/images
within that directory:
'/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/valid/images'
My images are in my Datastore
, not that directory, hence the error.
What I have tried - updating custom.yaml
path
All my data (train
and valid
) is contained on AzureBlobStorage
. In Azure ML Studio I have created a Datastore
and added my data as a Dataset
(references my AzureBlobStorage account). My file structure is:
Dataset/
- Train/
- Images
- Labels
- Valid/
- Images
- Labels
Within my custom.yaml
file I have tried replacing path
with the following:
**Storage URI**: https://mystorageaccount.blob.core.windows.net/my-datasets
**Datastore URI**: azureml://subscriptions/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/resourcegroups/my-rg/workspaces/my_workspage/datastores/my_datastore/paths/Dataset/
If I do this I get the same error. This time it appends the path
to the end of the working directory. Example:
'/mnt/azureml/cr/j/18bdc3371eca4975a0c4a7123f9adaec/exe/wd/https://mystorageaccount.blob.core.windows.net/my-datasets/valid/images'
What I have tried - mounting
/ download
dataset
I've read the Microsoft docs - (e.g. here and here) - and it says things like:
For most scenarios, you'll use URIs (uri_folder and uri_file) - a location in storage that can be easily mapped to the filesystem of a compute node in a job by either mounting or downloading the storage to the node.
It feels like I should be mapping my data (in my Datastore
) to the compute filesystem. Then I could can use that path in my custom.yaml
. The documents are not clear on how I do that.
In brief: how do I set up my data on Azure ML so that the path
in my custom.yaml
points to the data?
Upvotes: 5
Views: 2849
Reputation: 272
A solution is to create a folder data asset with a path azureml://datastores/<data_store_name>/paths/<dataset-path>
and pass it as an input to your AzureML job. AzureML jobs resolve the path of uri_folder inputs at runtime, so the custom.yaml
can programmatically be updated to contain this path.
Here is an example of an AzureML job implementing this solution:
from azure.ai.ml import command
from azure.ai.ml import Input
command_job = command(
inputs=dict(
data=Input(
type="uri_folder",
path="azureml:your-data-asset:version-number",
)
),
command="""
echo "The data asset path is ${{ inputs.data }}" &&
# Update custom.yaml to contain the correct path
sed -i "s|path:.*$|path: ${{ inputs.data }}|" custom.yaml &&
# Now custom.yaml contains the correct path so we can run the training
yolo detect train data=custom.yaml model=yolov8n.pt epochs=1 imgsz=1280 seed=42 project=your-experiment name=experiment
""",
code="./src/",
environment="your-environment",
compute="your-compute-target",
experiment_name="your-experiment",
display_name="your-display-name",
)
Note that you need to have the latest mlflow and azureml-mlflow libraries installed to make sure your model, parameters and metrics are logged with mlflow:
ultralytics==8.0.133
azureml-mlflow==1.52.0
mlflow==2.4.2
Edit: Note that I published tutorials explaining all the steps to run a yolov8 training with AzureML:
In the blogpost I create the azureML dataset from a local folder. In your case the dataset is already stored in a datastore so you need to specify a path azureml://datastores/<data_store_name>/paths/<dataset-path>
instead of a local path when you create the azureml data asset.
Upvotes: 2