CarterB
CarterB

Reputation: 542

How to access np array in blob storage in azure notebook

I've seen several answers to this but none of them seem to work.

I have a .npy file in a blob storage container and want to use it in a machine learning workspace (I am using azure notebooks)

How can I access it and load it in to memory to begin training models on it?

Dataset.Tabular does not have npy as an accepted file type to import into the notebook, but does have csv and parquet. I have multiple dimensions so not sure either of these would work for me? or is there an easy way to change my .npy into .csv while keeping the same structure?

Upvotes: 0

Views: 478

Answers (1)

Jim Xu
Jim Xu

Reputation: 23141

Regarding the issue, you can convert the np array into pandas dataframe. Then you can use the pandas dataframe to create TabularDataSet or convert pandas dataframe to csv or parquet then create TabularDataSet.

For example

# convert the np array into pandas dataframe
from azure.storage.blob.baseblobservice import BaseBlobService
import numpy as np

account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'

blob_service = BaseBlobService(
    account_name=account_name,
    account_key=account_key
)
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
print(sas_token)
url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
print(url_with_sas)
ds = np.DataSource()
# ds = np.DataSource(None)  # use with temporary file
# ds = np.DataSource(path) # use with path like `data/`
f = ds.open(url_with_sas)
dat = np.fromfile(f)

import pandas as pd

df = pd.DataFrame(my_array)

#create dataset
from azureml.core import Workspace, Dataset

ws = Workspace.from_config()
datastore = ws.get_default_datastore()
training_data = Dataset.Tabular.register_pandas_dataframe(
    df , datastore, 'EthereumRates')

For more details, please refer to the blog and the blog

Upvotes: 1

Related Questions