Reputation: 11002
I have a convolutional neural network with two different output streams:
input
|
(...) <-- several convolutional layers
|
_________
(several layers) | | (several layers)
fully-connected | | fully-connected
output stream 1 -> | | <- output stream 2
I would like to compute stream 1 on /gpu:0
and stream 2 on /gpu:1
. Unfortunately I were not able to set it up properly.
This attempt:
...placeholders...
...conv layers...
with tf.device("/gpu:0"):
...stream 1 layers...
nn_out_1 = tf.matmul(...)
with tf.device("/gpu:1"):
...stream 2 layers...
nn_out_2 = tf.matmul(...)
Runs dead slow (slower than training on 1 GPU solely) and sometimes produces NaN values in the output. I thought this might be because the with
statements may not be synchronized properly. So I added control_dependencies
and placed the conv layers on /gpu:0
explicitly:
...placeholders... # x -> input, y -> labels
with tf.device("/gpu:0"):
with tf.control_dependencies([x, y]):
...conv layers...
h_conv_flat = tf.reshape(h_conv_last, ...)
with tf.device("/gpu:0"):
with tf.control_dependencies([h_conv_flat]):
...stream 1 layers...
nn_out_1 = tf.matmul(...)
with tf.device("/gpu:1"):
with tf.control_dependencies([h_conv_flat]):
...stream 2 layers...
nn_out_2 = tf.matmul(...)
...but with this approach the network isn't even running. No matter what I've tried, it complained about the input not being initialized:
tensorflow.python.framework.errors.InvalidArgumentError:
You must feed a value for placeholder tensor 'x'
with dtype float
[[Node: x = Placeholder[dtype=DT_FLOAT, shape=[],
_device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Without the with
statements the network is training on /gpu:0
only and runs fine - trains reasonable stuff, no errors.
What am I doing wrong? Is TensorFlow not able to split different streams of layers in one network to different GPUs? Do I always have to split the complete network in different towers?
Upvotes: 4
Views: 1572
Reputation: 53
There is an example of how to use many gpus on one network https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py probably you can copy the code. Also can get something like this
# Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(sum)
Looking at: https://www.tensorflow.org/versions/r0.7/how_tos/using_gpu/index.html#using-multiple-gpus
Best Regards
Upvotes: 2