Reputation: 863
Question: I convert the basic MNIST example from Tensorflow to a fully convolutional implementation. Now 100 iterations take roughly 20 times longer than before. What causes this?
I take the basic MNIST example from the Tensorflow website. Now I convert the final FC layer to a convolutional layer, inspired by this post by Yann LeCunn and this Quora post, or more general, the article Fully Convolutional Networks for Semantic Segmentation
So I change this code block
with tf.name_scope("Fully_Connected") as scope:
W_fc1 = weight_variable([7**2 * 64, 1024], 'Fully_Connected_layer_1')
b_fc1 = bias_variable([1024], 'bias_for_Fully_Connected_Layer_1')
h_pool2_flat = tf.reshape(h_pool2, [-1, 7**2*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
To this code block
with tf.name_scope("FC_conv") as scope:
W_fcc1 = weight_variable([7,7,64,1024],'FC_conv_1')
b_fcc1 = bias_variable([1024],'bias_for_FC_conv_1')
h_fcc1 = tf.nn.relu(tf.nn.conv2d(h_pool2, W_fcc1, strides=[1, 1, 1, 1], padding='VALID')+b_fcc1)
After this change, 100 iterations take 70 seconds as opposed to a few seconds. That is, the FC implementation took rougly 5 seconds for 100 iterations. The fully-conv implementation takes roughly 70 seconds for 100 iterations.
Can someone give me a clue on this? Why does this convolutional implementation take so much more time?
Thanks a lot for your time and answers
Upvotes: 2
Views: 1272
Reputation: 57893
Your convolution is way too big. It should be more like this
NUM_CHANNELS=1
conv1_weights = tf.Variable(
tf.truncated_normal([5, 5, NUM_CHANNELS, 32], # 5x5 filter, depth 32.
stddev=0.1,
seed=SEED))
See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/mnist/convolutional.py
Upvotes: 0
Reputation: 1624
Ordinarily, convolutional layers are much more efficient than fully-connected layers, but this is because they offer a way to dramatically reduce the number of parameters that need to be optimized. In this case, if I'm not mistaken, the number of parameters is the same, since the convolutional kernels are being applied to the entire extent of the input. In a sense, it is the locally-connected aspect of convolutional layers that buys you the reduced computational complexity.
So the big difference then is that the convolutional layer needs to convolve each value with a 7x7 kernel. The conventional wisdom I've heard is that it's best to keep the size of the kernels to 3x3 or at most 5x5, and this is in part due to the computational expense of convolving using larger kernels.
I haven't personally explored using fully-convolutional networks, but from what I can gather, the purpose is to improve the accuracy of the model, not so much to improve efficiency. Hope that helps.
Upvotes: 1