Performance differences in Tensorflow when nesting multiple operations at a single node

Question

I am somewhat new to Tensorflow, and was wondering what kind of performance considerations I should keep in mind when constructing a graph.

My main question is whether there is any change in the performance of a computation when multiple operations are nested at a single node, when compared to assigning each operation to a separate node. For example, if I want to use batch normalization, followed by a dense layer, and an relu, I could structure it so that all three operations are performed at a single node:

input=tf.placeholder(shape=[None,input_len],dtype=tf.float32
output=tf.nn.relu(tf.matmul(tf.contrib.layers.batch_norm(input),W)+b)

or I could separate them into three separate nodes:

input=tf.placeholder(shape=[None,input_len],dtype=tf.float32
x1=tf.contrib.layers.batch_norm(input)
x2=tf.matmul(x1,W)+b
ouput=tf.nn.relu(x2)

Obviously this will affect the compactness/readability of the code, but does this also affect how TF implements the graph, and runs the computations? Is nesting operations at a single node discouraged, and if so is it because of performance issues, or just stylistic?

If it makes a difference, I am interested in running my computations on a gpu.

mrry · Accepted Answer

Both code fragments will generate identical TensorFlow graphs, and the resulting graphs will have the same performance characteristics.

To validate this assertion, you can look at the tf.GraphDef protocol buffer that TensorFlow builds by calling print tf.get_default_graph().as_graph_def() after running either code fragment.

Performance differences in Tensorflow when nesting multiple operations at a single node

Answers (1)

Related Questions