How should one create a dataset using the tensorflow.data.Dataset api from a large set of wav files?

Question

I have 8,742 wav files (amounting to around 7.1GB) and would like to get the raw data into a tf.data.Dataset.

My first attempt is below. Please note that I used the soundfile package as the wav files have different bit rates and some are 24bit per sample. It is my understanding that many packages do not offer support for 24bit wav files.

import tensorflow as tf
import soundfile

filepaths = tf.gfile.Glob('michael/dataset/wav_filepaths/*.wav') #Get the files into a list

labels = get_labels #pseudo function to obtain corresponding labels to audio

raw_audio = [] #List to hold raw audio lists. These are 2 channel wavs so this will be a 3D list

#Create a list were each element is raw audio data
for f in filepaths:
    try:
        data, sample_rate = soundfile.read(f) #2 channels
        raw_audio.append(data.tolist())
    except Exception as err: #Poor practice to catch all exceptions like this but it is just an example
        print ('Exception')
        print (f)

training_set = tf.data.Dataset.from_tensor_slices((raw_audio, labels))

The problem with this solution is that it is horrendously slow as soundfile reads all of the raw data and stores it all in a list.

I am now considering a solution whereby I initially store filenames and corresponding labels in a tf.data.Dataset. I would then create a mapping function which calls soundfile.read or possibly even use tensorflow.contrib.framework.python.ops.audio_ops within the function and only return the raw audio and corresponding label. The function would be called using the tf.data.Dataset.map function so that the whole process becomes part of the graph and is parallelised.

My first concern with the proposed solution is that it is not ideal and seems a bit "hacky" to store filenames in a dataset to later be replaced by corresponding data. My second concern is that the GPU I am using (1080Ti with 11GB memory) could run out of memory.

Please provide a better way (in particular it should be faster) to get raw audio data from a large set of wav files into a tf.data.Dataset.

javidcf · Accepted Answer

Although you could in theory read the files with tf.read_file and decode them with tf.contrib.ffmpeg.decode_audio, the usual approach for this kind of cases is to convert the data to TFRecord format and read it with a tf.data.TFRecordDataset. This blog post shows an example of how to do that, in your case you would need a script that reads each WAV file, decodes it and writes the vector of samples (I suppose as a 32-bit value would be the simplest way) in the file. Note that if you want to batch multiple audio files into a tensor either they must have all the same size or you would have to use tf.data.Dataset.padded_batch to form proper tensors.

How should one create a dataset using the tensorflow.data.Dataset api from a large set of wav files?

Answers (2)

Related Questions