Jamil
Jamil

Reputation: 858

How to use Azure Data Lake Store as an input data set for Azure ML?

I am moving data into Azure Data Lake Store and processing it using Azure Data Lake Analytics. Data is in form of XML and I am reading it through XML Extractor. Now I want to access this data from Azure ML and it looks like Azure Data Lake store is not directly supported at the moment.

What are the possible ways to use Azure Data Lake Store with Azure ML?

Upvotes: 0

Views: 1284

Answers (2)

Roshan Kumar Gupta
Roshan Kumar Gupta

Reputation: 1


account_name=os.getenv("ADLSGEN2_ACCOUNTNAME_62", "<storage account name>") # ADLS Gen2 account name
tenant_id=os.getenv("ADLSGEN2_TENANT_62", "") # tenant id of service principal
client_id=os.getenv("ADLSGEN2_CLIENTID_62", "") # client id of service principal
client_secret=os.getenv("ADLSGEN2_CLIENT_SECRET_62", "") # the secret of service principal

try:
    adlsgen2_datastore = Datastore.get(workspace, adlsgen2_datastore_name)
    print("Found ADLS Gen2 datastore with name: %s" % adlsgen2_datastore_name)
    datastore_paths = [(adlsgen2_datastore, 'path to data.csv')]
    dataset = Dataset.Tabular.from_delimited_files(path=datastore_paths)
    df = dataset.to_pandas_dataframe()
    display(dataset.to_pandas_dataframe())
    datastore = adlsgen2_datastore
    dataset = Dataset.Tabular.register_pandas_dataframe(df, datastore, "<DataSetStep>", show_progress=True)

except:
    adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
        workspace=workspace,
        datastore_name=adlsgen2_datastore_name,
        filesystem='fs', # Name of ADLS Gen2 filesystem
        account_name=account_name, # ADLS Gen2 account name
        tenant_id=tenant_id, # tenant id of service principal
        client_id=client_id, # client id of service principal
        client_secret=client_secret) # the secret of service principal

Reference : https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

Upvotes: 0

Matt Winkler
Matt Winkler

Reputation: 315

Right now, Azure Data Lake Store is not a supported source, as you note. That said, Azure Data Lake Analytics can also be used to write data out to Azure Blob Store, and so you can use that as an approach to process the data in U-SQL and then stage it for Azure Machine Learning to process it from Blob store. When Azure ML supports Data Lake store, then you can switch that over.

Upvotes: 4

Related Questions