Devin Haslam
Devin Haslam

Reputation: 759

Tensorflow: Error changing batch size of deep CNN

I have replicated a deep CNN from a research paper. When I originally constructed the model, I assumed that the batch size would be one. However, now that I have learned more about batch sizes, I want to use a batch size of 40.

Here is the Github Repository

This is a very deep network, so I will show a more basic version of the project below:

x = tf.placeholder(tf.float32, shape=[None, 7168])
y_ = tf.placeholder(tf.float32, shape=[None, 7168, 3])

#MANY CONVOLUTIONS OMITTED HERE

#one of many transpose convolutions, the 40 here is a change I made for the batch size
w = tf.Variable(tf.constant(1.,shape=[2,2,4,1,192]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [40,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')

#I reshape the final convolution's batch size because I was getting errors
final = tf.reshape(final, [40, 7168])

#Accuracy and loss functions omitted because they do not deal with batch size

#Lastly, I train the model where a and be are size [40][7169][3] 40 is the batch size
train_step.run(feed_dict={x: a, y_: b, keep_prob: .5})

When I run the code from the repository, I get this error. What more changes do I need to make so that the batch size is 40?

Traceback (most recent call last):
  File "<stdin>", line 31, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2042, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4490, in _run_using_default_session
    session.run(operation, feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN Backward Data function launch failure : input shape([320,4,4,1,896]) filter shape([3,3,1,896,800])
         [[Node: gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2 = Conv3DBackpropInputV2[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/conv3d_2/Conv3D_grad/Shape, conv3d_1/kernel/read, gradients/conv3d_2/BatchToSpaceND_grad/SpaceToBatchND)]]

Caused by op u'gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2', defined at:
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 343, in minimize
    grad_loss=grad_loss)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 414, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_grad.py", line 82, in _Conv3DGrad
    data_format=data_format),
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1084, in conv3d_backprop_input_v2
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'conv3d_2/Conv3D', defined at:
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in conv3d_dilation
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 809, in conv3d
    return layer.apply(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 671, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 575, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/convolutional.py", line 167, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 835, in __call__
    return self.conv_op(inp, filter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 499, in __call__
    return self.call(inp, filter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 492, in _with_space_to_batch_call
    result = self.op(input_converted, filter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 187, in __call__
    name=self.name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 847, in conv3d
    padding=padding, data_format=data_format, name=name)

InternalError (see above for traceback): cuDNN Backward Data function launch failure : input shape([320,4,4,1,896]) filter shape([3,3,1,896,800])
         [[Node: gradients/conv3d_2/Conv3D_grad/Conv3DBackpropInputV2 = Conv3DBackpropInputV2[T=DT_FLOAT, data_format="NDHWC", padding="VALID", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/conv3d_2/Conv3D_grad/Shape, conv3d_1/kernel/read, gradients/conv3d_2/BatchToSpaceND_grad/SpaceToBatchND)]]

Upvotes: 1

Views: 1377

Answers (2)

Maxim
Maxim

Reputation: 53788

Try this:

shape = tf.shape(tf.reshape(x, [-1, 32, 32, 7, 1]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter=w, output_shape=shape, strides=[1,2,2,2,1], padding='SAME')

final = tf.reshape(final, [-1, 7168])

This way you don't hard-code 40 in the model, but able to feed any batch size you want, including 40.

Upvotes: 1

RobR
RobR

Reputation: 2190

Best practice is to avoid hard coding the batch size in the graph. As discussed here (see "How do I build a graph that works with variable batch sizes?") you should specify shapes as [None, nx, ny, nz], and retrieve batch sizes using tf.shape(input)[0]. Also, for reshaping you can use a form like this: tf.reshape(input, [-1, nx, ny, nz]) where the -1 specifies that the batch dimension should be set to the appropriate size during runtime.

Upvotes: 0

Related Questions