Is there a difference in computation according to input shape? (CNN in Python with Tensorflow)

Question

I am solving a text classification problem by reference to the paper(Kim, 2014). And then I found between below two models, the model on the left(Model 1) takes about 2.5 times more time than the model on the right(Model 2). I think the number of weight parameters of the two models is the same. Why is there the difference of time between the two models?
*The input data's contents of the two models are the same. Simply changed the shape.

I used tf.nn.conv2d. And the filter shapes and the stride are as following
model 1 : 3x9x1xthe number of filters, stride 3
model 2 : 1x9x3xthe number of filters, stride 1
And the other things are the same
*On above image, width means 'self.embedding_dim' and height means 'self.max_length'.

pooled_outputs = []
with tf.name_scope("conv-maxpool-3"):
# Convolution Layer
filter_shape = [3, self.embedding_dim, 1, self.num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[self.num_filters]), name="b")
conv = tf.nn.conv2d(
    self.embedded_chars_expanded,
    W,
    strides=[1, 1, 3, 1],
    padding="VALID",
    name="conv")
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
    h,
    ksize=[1, self.max_length - 3 + 1, 1, 1],
    strides=[1, 1, 1, 1],
    padding='VALID',
    name="pool")
pooled_outputs.append(pooled)

----------------------------------------------------------------------

pooled_outputs = []
with tf.name_scope("conv-maxpool-1"):
# Convolution Layer
filter_shape = [1, self.embedding_dim, 3, self.num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[self.num_filters]), name="b")
conv = tf.nn.conv2d(
    self.embedded_chars_expanded,
    W,
    strides=[1, 1, 1, 1],
    padding="VALID",
    name="conv")
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
    h,
    ksize=[1, self.max_length - 1 + 1, 1, 1],
    strides=[1, 1, 1, 1],
    padding='VALID',
    name="pool")
pooled_outputs.append(pooled)

kafman · Accepted Answer

In the first model you set the stride to [1, 1, 3, 1] and you don't specify the data order, so it defaults to NHWC, i.e. (num_batches, height, width, channels) (check the docu). So the stride of 3 applies to the width, not the height, as your picture of model 1 indicates. Because you are using VALID padding, the stride of 3 on the width has no effect, by the way.

So basically, your depiction of model 1 is wrong: in step 2 it doesn't jump to the 4th row, but to the 2nd row. Meaning model 1 computes about 3 times as many convolutions as model 2.

There's other factors that could contribute to a difference in speed - may be model 2 can be better parallelized on the GPU, but that is hard to judge.

Is there a difference in computation according to input shape? (CNN in Python with Tensorflow)

Answers (1)

Related Questions