Reputation: 966
I have 8,742 wav files (amounting to around 7.1GB) and would like to get the raw data into a tf.data.Dataset
.
My first attempt is below. Please note that I used the soundfile
package as the wav files have different bit rates and some are 24bit per sample. It is my understanding that many packages do not offer support for 24bit wav files.
import tensorflow as tf
import soundfile
filepaths = tf.gfile.Glob('michael/dataset/wav_filepaths/*.wav') #Get the files into a list
labels = get_labels #pseudo function to obtain corresponding labels to audio
raw_audio = [] #List to hold raw audio lists. These are 2 channel wavs so this will be a 3D list
#Create a list were each element is raw audio data
for f in filepaths:
try:
data, sample_rate = soundfile.read(f) #2 channels
raw_audio.append(data.tolist())
except Exception as err: #Poor practice to catch all exceptions like this but it is just an example
print ('Exception')
print (f)
training_set = tf.data.Dataset.from_tensor_slices((raw_audio, labels))
The problem with this solution is that it is horrendously slow as soundfile reads all of the raw data and stores it all in a list.
I am now considering a solution whereby I initially store filenames and corresponding labels in a tf.data.Dataset
. I would then create a mapping function which calls soundfile.read
or possibly even use tensorflow.contrib.framework.python.ops.audio_ops
within the function and only return the raw audio and corresponding label. The function would be called using the tf.data.Dataset.map
function so that the whole process becomes part of the graph and is parallelised.
My first concern with the proposed solution is that it is not ideal and seems a bit "hacky" to store filenames in a dataset to later be replaced by corresponding data. My second concern is that the GPU I am using (1080Ti with 11GB memory) could run out of memory.
Please provide a better way (in particular it should be faster) to get raw audio data from a large set of wav files into a tf.data.Dataset
.
Upvotes: 3
Views: 1507
Reputation: 241
You can try to use a generator function which feeds the data into the pipline. Take a look at https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator
Upvotes: 2
Reputation: 59681
Although you could in theory read the files with tf.read_file
and decode them with tf.contrib.ffmpeg.decode_audio
, the usual approach for this kind of cases is to convert the data to TFRecord format and read it with a tf.data.TFRecordDataset
. This blog post shows an example of how to do that, in your case you would need a script that reads each WAV file, decodes it and writes the vector of samples (I suppose as a 32-bit value would be the simplest way) in the file. Note that if you want to batch multiple audio files into a tensor either they must have all the same size or you would have to use tf.data.Dataset.padded_batch
to form proper tensors.
Upvotes: 2