TensorFlow Dataset API: Consuming NumPy arrays with padded batching

Question

Quick disclaimer: This is my first time actively asking a question here on stack overflow.

Now on to the question itself. I experience some problems using the fairly new Dataset API of tensorflow 1.4 together with reading variable length inputs from numpy arrays and padded batching.

According to the official documentation (https://www.tensorflow.org/programmers_guide/datasets#consuming_numpy_arrays) using arrays as input is both supported and straightforward. The crux is now that the data has to be fed into tensorflow placeholders first before the padded_batch method of a dataset object can be applied to the data. The numpy representation of variable length inputs, however, is not symmetric and is therefore interpreted as a sequence rather than an array. But isn't the whole reason behind providing a padded_batch method that a sequence of variable length inputs can be processed by the dataset. Long story short, have any of you guys experienced a similar situation and found a solution for it? Thank you so much for your help!

Following are some code snippets which might help with understanding the problem better.

What the input looks like:

array([array([65,  3, 96, 94], dtype=int32), array([88], dtype=int32),
       array([113,  52, 106,  57,   3,  86], dtype=int32),
       array([88,  3, 23, 91], dtype=int32), ... ])

The actual code snippet where the dataset is defined in populated:


for fold, (train_idx, dev_idx) in enumerate(sss.split(X, y)):

    X_train = X[train_idx]
    y_train = y[train_idx]

    X_dev = X[dev_idx]
    y_dev = y[dev_idx]

    tf.reset_default_graph()
    with tf.Session() as sess:

       features_placeholder = tf.placeholder(tf.int32, [None, None], name='input_x')
       labels_placeholder = tf.placeholder(tf.int32, [None, num_classes], name='input_y')

       dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))

       dataset = dataset.shuffle(buffer_size=len(train_idx))
       dataset = dataset.padded_batch(batch_size, padded_shapes=([None], [None]), padding_values=(1, 0))

       iterator = dataset.make_initializable_iterator()
       next_element = iterator.get_next()

       sess.run(iterator.initializer, feed_dict={features_placeholder: np.array(X_train),
                                                 labels_placeholder: np.array(y_train)})

And finally, the corresponding stack trace from jupyter notebook:


ValueError                                Traceback (most recent call last)
 in ()
----> 1 cnn.train2(X_idx, y_bin, n_splits=5)

 in train2(self, X, y, n_splits)
    480 
    481                     self.session.run(iterator.initializer, feed_dict={features_placeholder: np.array(X_train),
--> 482                                                                       labels_placeholder: np.array(y_train)})
    483 #                     self.session.run(iterator.initializer)
    484 

~/.virtualenvs/ravenclaw/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.virtualenvs/ravenclaw/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1087             feed_handles[subfeed_t] = subfeed_val
   1088           else:
-> 1089             np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
   1090 
   1091           if (not is_tensor_handle_feed and

~/.virtualenvs/ravenclaw/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    490 
    491     """
--> 492     return array(a, dtype, copy=False, order=order)
    493 
    494 

ValueError: setting an array element with a sequence.

Thanks again for your support.

TensorFlow Dataset API: Consuming NumPy arrays with padded batching

Answers (1)

Related Questions