Reputation: 53
I'm using the Dataset API for input pipelines in TensorFlow (version: r1.2). I built my dataset and batched it with a batch size of 128. The dataset fed into the RNN.
Unfortunately, the dataset.output_shape
returns dimension(none) in the first dimension, so the RNN raises an error:
Traceback (most recent call last):
File "untitled1.py", line 188, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "untitled1.py", line 121, in main
run_training()
File "untitled1.py", line 57, in run_training
is_training=True)
File "/home/harold/huawei/ConvLSTM/ConvLSTM.py", line 216, in inference
initial_state=initial_state)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 566, in dynamic_rnn
dtype=dtype)
File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 636, in _dynamic_rnn_loop
"Input size (depth of inputs) must be accessible via shape inference,"
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
I think this error is caused by the shape of input, the first dimension should be batch size but not none.
here is the code:
origin_dataset = Dataset.BetweenS_Dataset(FLAGS.data_path)
train_dataset = origin_dataset.train_dataset
test_dataset = origin_dataset.test_dataset
shuffle_train_dataset = train_dataset.shuffle(buffer_size=10000)
shuffle_batch_train_dataset = shuffle_train_dataset.batch(128)
batch_test_dataset = test_dataset.batch(FLAGS.batch_size)
iterator = tf.contrib.data.Iterator.from_structure(
shuffle_batch_train_dataset.output_types,
shuffle_batch_train_dataset.output_shapes)
(images, labels) = iterator.get_next()
training_init_op = iterator.make_initializer(shuffle_batch_train_dataset)
test_init_op = iterator.make_initializer(batch_test_dataset)
print(shuffle_batch_train_dataset.output_shapes)
I print output_shapes
and it gives:
(TensorShape([Dimension(None), Dimension(36), Dimension(100)]), TensorShape([Dimension(None)]))
I suppose that it should be 128, because I have batched dataset:
(TensorShape([Dimension(128), Dimension(36), Dimension(100)]), TensorShape([Dimension(128)]))
Upvotes: 5
Views: 3716
Reputation: 1856
This feature has been added with the drop_remainder
parameter used like the following:
batch_test_dataset = test_dataset.batch(FLAGS.batch_size, drop_remainder=True)
From the docs:
drop_remainder: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case its has fewer than batch_size elements; the default behavior is not to drop the smaller batch.
Upvotes: 3
Reputation: 146
They hardcoded batch size in implementation and it always will return None (tf 1.3).
def _padded_shape_to_batch_shape(s):
return tensor_shape.vector(None).concatenate(
tensor_util.constant_value_as_shape(s))
In this way, they can batch all elements (e.g. dataset_size=14
, batch_size=5
, last_batch_size=4
).
You can use dataset.filter and dataset.map to fix this issue
d = contrib.data.Dataset.from_tensor_slices([[5] for x in range(14)])
batch_size = 5
d = d.batch(batch_size)
d = d.filter(lambda e: tf.equal(tf.shape(e)[0], batch_size))
def batch_reshape(e):
return tf.reshape(e, [args.batch_size] + [s if s is not None else -1 for s in e.shape[1:].as_list()])
d = d.map(batch_reshape)
r = d.make_one_shot_iterator().get_next()
print('dataset_output_shape = %s' % r.shape)
with tf.Session() as sess:
while True:
print(sess.run(r))
Output
dataset_output_shape = (5, 1)
[[5][5][5][5][5]]
[[5][5][5][5][5]]
OutOfRangeError
Upvotes: 2