Reputation: 473
I have the following simple code:
import tensorflow as tf
import numpy as np
filename = # a list of wav filenames
x = tf.placeholder(tf.string)
def mfcc(x):
feature = # some function written in NumPy to convert a wav file to MFCC features
return feature
mfcc_fn = lambda x: mfcc(x)
# create a training dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x))
train_dataset = train_dataset.repeat()
train_dataset = train_dataset.map(mfcc_fn)
train_dataset = train_dataset.batch(100)
train_dataset = train_dataset.prefetch(buffer_size=1)
# create an iterator and iterate over training dataset
iterator = tf.data.Iterator.from_structure(train_dataset.output_types, train_dataset.output_shapes)
train_iterator = iterator.make_initializer(train_dataset)
with tf.Session() as sess:
sess.run(train_iterator, feed_dict={x: filename})
Basically, the code creates a tf.data.dataset
object which loads a wav file and converts it to mfcc feature. Here, the data conversion happens at train_dataset.map(mfcc_fn)
at which I apply an mfcc function written in NumPy to all input data.
Apparently, the code doesn't work here because NumPy doesn't support operations on tf.placeholder
object. Is it possible map a function to input to tf.data.dataset
if I have to write the function in NumPy? The reason I don't use TensorFlow's buit-in MFCC feature transformation is because the FFT function in TensorFlow gives significantly different output than its NumPy counterpart(as illustraded here), and the model I am building is prone to MFCC features generated using NumPy.
Upvotes: 2
Views: 4553
Reputation: 37
You can use a python generator to handle the numpy array and then pass that to tf.data.Dataset.from_generator
For eg.
def sample_generator(image_paths):
for image_path in image_paths:
img = cv2.imread(image_path)
# Do all the custom numpy things
yield img
data_loader = tf.data.Dataset.from_generator(sample_generator,
args=[image_paths],
output_types=tf.int32,
output_shapes=((None, None, 3))
This will create a TensorFlow data loader from the python generator. You can read more about this here.
Upvotes: 1
Reputation: 3094
You can achieve that with the tf.py_func
function, or tf.py_function
(which is the newer version). It does exactly what you want, it will wrap your numpy function that operates on arrays in a tensorflow operation that you can include as part of your dataset graph.
Upvotes: 4