Reputation: 33
I am solving a text classification problem by reference to the paper(Kim, 2014).
And then I found between below two models, the model on the left(Model 1) takes about 2.5 times more time than the model on the right(Model 2).
I think the number of weight parameters of the two models is the same.
Why is there the difference of time between the two models?
*The input data's contents of the two models are the same. Simply changed the shape.
I used tf.nn.conv2d. And the filter shapes and the stride are as following
model 1 : 3x9x1xthe number of filters, stride 3
model 2 : 1x9x3xthe number of filters, stride 1
And the other things are the same
*On above image, width means 'self.embedding_dim' and height means 'self.max_length'.
pooled_outputs = []
with tf.name_scope("conv-maxpool-3"):
# Convolution Layer
filter_shape = [3, self.embedding_dim, 1, self.num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[self.num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 3, 1],
padding="VALID",
name="conv")
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
h,
ksize=[1, self.max_length - 3 + 1, 1, 1],
strides=[1, 1, 1, 1],
padding='VALID',
name="pool")
pooled_outputs.append(pooled)
----------------------------------------------------------------------
pooled_outputs = []
with tf.name_scope("conv-maxpool-1"):
# Convolution Layer
filter_shape = [1, self.embedding_dim, 3, self.num_filters]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[self.num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv")
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
h,
ksize=[1, self.max_length - 1 + 1, 1, 1],
strides=[1, 1, 1, 1],
padding='VALID',
name="pool")
pooled_outputs.append(pooled)
Upvotes: 2
Views: 145
Reputation: 2860
In the first model you set the stride to [1, 1, 3, 1]
and you don't specify the data order, so it defaults to NHWC
, i.e. (num_batches, height, width, channels) (check the docu). So the stride of 3 applies to the width, not the height, as your picture of model 1 indicates. Because you are using VALID
padding, the stride of 3 on the width has no effect, by the way.
So basically, your depiction of model 1 is wrong: in step 2 it doesn't jump to the 4th row, but to the 2nd row. Meaning model 1 computes about 3 times as many convolutions as model 2.
There's other factors that could contribute to a difference in speed - may be model 2 can be better parallelized on the GPU, but that is hard to judge.
Upvotes: 1