B. Bogart
B. Bogart

Reputation: 1075

Write to a mounted filesystem in azureml with azureml-sdk

I am trying to use and AMLCompute instance to preprocess my data. To do so I need to be able to write the processed data back to the datastore. I am taking this approach because the cluster will automatically shutdown when it is complete so I can let it run until it is done without worrying about paying for more time than is needed.

The problem is when I try to write back to the datastore (which is mounted as a dataset) I get the following error:

OSError: [Errno 30] Read-only file system: '/mnt/batch/tasks/shared/LS_root/jobs/[...]/wav_test'

I have set the access policy for my datastore to allow read, add, create, write, delete, and list, but I don't think that is the issue because I can already write to the datastore from the Microsoft Azure File Explorer.

Is there a way to mount a datastore directly or through a dataset with write privileges from the azureml python sdk?

Alternatively, is there a better way to preprocess this (audio) data on azure for machine learning?

Thanks!

EDIT: I'm adding an example that illustrates the problem.

from azureml.core import Workspace, Dataset, Datastore
import os

ws = Workspace.from_config()
ds = Dataset.get_by_name(ws, name='birdsongs_alldata')

mount_context = ds.mount()
mount_context.start()

os.listdir(mount_context.mount_point)

output:

['audio_10sec', 'mp3', 'npy', 'resources', 'wav']

So the file system is mounted and visible.

# try to write to the mounted file system
outfile = os.path.join(mount_context.mount_point, 'test.txt')

with open(outfile, 'w') as f:
    f.write('test')

Error:

--------------------------------------------------------------------------- OSError                                   Traceback (most recent call last) <ipython-input-9-1b15714faded> in <module>
      1 outfile = os.path.join(mount_context.mount_point, 'test.txt')
      2 
----> 3 with open(outfile, 'w') as f:
      4     f.write('test')

OSError: [Errno 30] Read-only file system: '/tmp/tmp8ltgsx6x/test.txt'

Upvotes: 4

Views: 2009

Answers (1)

Daniel Labbe
Daniel Labbe

Reputation: 2019

I've simulated the same scenario in my environment and it has worked. Could you please share the code and the full error message in the question?

Regarding the cost concerns, you can use the aml python sdk to start, stop and wait for the running state with the azureml.core.compute. This way you can have more control over the compute time "running" (start, execute, stop).

The optimal way of dealing preprocess audio files, depends a bit of its content. If the audio contains voice, I strongly recommend you use Azure Cognitive Services - Speech API (speech-to-text).

If it's not voice, you can use the wave module, like in the code below:

from wave import open as open_wave
waveFile = open_wave(<filename>,'rb')
nframes = waveFile.getnframes()
wavFrames = waveFile.readframes(nframes)
ys = numpy.fromstring(wavFrames, dtype=numpy.int16)

Credits

This method is not exclusively from azure, but will allow you to use the data in a structured way.

Upvotes: 2

Related Questions