ankit
ankit

Reputation: 347

Azure ML Studio: OSError: Cannot save file into a non-existent directory

I am using Azure ML Studio to read data from a csv file by creating a data asset test5 and write data into a csv file for my current working directory (which is failing). I am submitting a Job using a Compute Cluster and a Custom Environment and I am following the instructions from this tutorial.

I have written the code in a notebook as:

 from azure.ai.ml import MLClient
 from azure.identity import DefaultAzureCredential
 credential = DefaultAzureCredential()
 ml_client = MLClient(
     credential=credential,
     subscription_id="abc",
     resource_group_name="xyz",
     workspace_name="pqr",
 )
 from azure.ai.ml import command
 from azure.ai.ml import Input
    
 registered_model_name = "read_data"
 env_name = "docker-context"
 job = command(inputs=dict(data=Input(type="uri_file",path="azureml:test5:1",),
         registered_model_name=registered_model_name
     ),   
     code="./src/",g
     command="python main.py --data ${<!-- -->{inputs.data}} --registered_model_name ${<!-- -->{inputs.registered_model_name}}",
     environment="docker-context:10",
     compute="amlcluster01",
     experiment_name="read_data1",
     display_name="read_data2",
     )
 ml_client.create_or_update(job)

This works fine. The content of the main.py file is:

 import os
 import argparse
 import pandas as pd
    
 def main():
     print("Hello")
      # input and output arguments
     parser = argparse.ArgumentParser()
     parser.add_argument("--data", type=str, help="path to input data")
     parser.add_argument("--registered_model_name", type=str, help="model name")
     args = parser.parse_args()
     print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
     print("input data:", args.data)
     read_data=pd.read_csv(args.data)
     #read_data=pd.read_parquet(args.data, engine='pyarrow')
     #credit_df = pd.read_excel(args.data, header=1, index_col=0)
     print(read_data)
     read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
    
     print("Hello World !")
    
 if __name__ == "__main__":
     main()

Here, all lines of code work fine except read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv').

It shows the error message as: OSError: Cannot save file into a non-existent directory:/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src.

Can anyone please help me how to save dataframe into a csv file into my current working directory through a Job. Any help would be appreciated.

Edit: As msamsami mentioned the code in the answer, when I replace read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv') with

file_path = os.path.join(os.getcwd(), 'file3.csv')
read_data.to_csv(file_path)
print(f"read_data saved to {file_path}")

I get the following output of the above print statement:

read_data saved to /mnt/azureml/cr/j/28bec1ee580a400894b48e5d8576f6ca/exe/wd/file3.csv

Upvotes: 0

Views: 2114

Answers (1)

frisko
frisko

Reputation: 877

The error is telling you that the directory you wish to save the CSV into doesn't exist. So, to make sure that everything is right, you should make sure that there exist an "azureml-in-a-day" folder under "Ankit19.Gupta", and the same for "src" under "azureml-in-a-day".

Also, try this:

import os

...

file_path = os.path.join(os.getcwd(), 'file3.csv')
read_data.to_csv(file_path)

print(f"read_data saved to {file_path}")

Upvotes: 1

Related Questions