Reputation: 347
I am using Azure ML Studio
to read data from a csv file by creating a data asset test5
and write data into a csv file for my current working directory (which is failing). I am submitting a Job using a Compute Cluster
and a Custom Environment
and I am following the instructions from this tutorial.
I have written the code in a notebook as:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
ml_client = MLClient(
credential=credential,
subscription_id="abc",
resource_group_name="xyz",
workspace_name="pqr",
)
from azure.ai.ml import command
from azure.ai.ml import Input
registered_model_name = "read_data"
env_name = "docker-context"
job = command(inputs=dict(data=Input(type="uri_file",path="azureml:test5:1",),
registered_model_name=registered_model_name
),
code="./src/",g
command="python main.py --data ${<!-- -->{inputs.data}} --registered_model_name ${<!-- -->{inputs.registered_model_name}}",
environment="docker-context:10",
compute="amlcluster01",
experiment_name="read_data1",
display_name="read_data2",
)
ml_client.create_or_update(job)
This works fine. The content of the main.py
file is:
import os
import argparse
import pandas as pd
def main():
print("Hello")
# input and output arguments
parser = argparse.ArgumentParser()
parser.add_argument("--data", type=str, help="path to input data")
parser.add_argument("--registered_model_name", type=str, help="model name")
args = parser.parse_args()
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
print("input data:", args.data)
read_data=pd.read_csv(args.data)
#read_data=pd.read_parquet(args.data, engine='pyarrow')
#credit_df = pd.read_excel(args.data, header=1, index_col=0)
print(read_data)
read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
print("Hello World !")
if __name__ == "__main__":
main()
Here, all lines of code work fine except read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
.
It shows the error message as: OSError: Cannot save file into a non-existent directory:/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src
.
Can anyone please help me how to save dataframe into a csv file into my current working directory through a Job. Any help would be appreciated.
Edit: As msamsami mentioned the code in the answer, when I replace read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
with
file_path = os.path.join(os.getcwd(), 'file3.csv')
read_data.to_csv(file_path)
print(f"read_data saved to {file_path}")
I get the following output of the above print statement:
read_data saved to /mnt/azureml/cr/j/28bec1ee580a400894b48e5d8576f6ca/exe/wd/file3.csv
Upvotes: 0
Views: 2114
Reputation: 877
The error is telling you that the directory you wish to save the CSV into doesn't exist. So, to make sure that everything is right, you should make sure that there exist an "azureml-in-a-day" folder under "Ankit19.Gupta", and the same for "src" under "azureml-in-a-day".
Also, try this:
import os
...
file_path = os.path.join(os.getcwd(), 'file3.csv')
read_data.to_csv(file_path)
print(f"read_data saved to {file_path}")
Upvotes: 1