Reputation: 542
I've seen several answers to this but none of them seem to work.
I have a .npy file in a blob storage container and want to use it in a machine learning workspace (I am using azure notebooks)
How can I access it and load it in to memory to begin training models on it?
Dataset.Tabular does not have npy as an accepted file type to import into the notebook, but does have csv and parquet. I have multiple dimensions so not sure either of these would work for me? or is there an easy way to change my .npy into .csv while keeping the same structure?
Upvotes: 0
Views: 478
Reputation: 23141
Regarding the issue, you can convert the np array into pandas dataframe. Then you can use the pandas dataframe to create TabularDataSet or convert pandas dataframe to csv or parquet then create TabularDataSet.
For example
# convert the np array into pandas dataframe
from azure.storage.blob.baseblobservice import BaseBlobService
import numpy as np
account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'
blob_service = BaseBlobService(
account_name=account_name,
account_key=account_key
)
sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
print(sas_token)
url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)
print(url_with_sas)
ds = np.DataSource()
# ds = np.DataSource(None) # use with temporary file
# ds = np.DataSource(path) # use with path like `data/`
f = ds.open(url_with_sas)
dat = np.fromfile(f)
import pandas as pd
df = pd.DataFrame(my_array)
#create dataset
from azureml.core import Workspace, Dataset
ws = Workspace.from_config()
datastore = ws.get_default_datastore()
training_data = Dataset.Tabular.register_pandas_dataframe(
df , datastore, 'EthereumRates')
For more details, please refer to the blog and the blog
Upvotes: 1