Jong Chan Park
Jong Chan Park

Reputation: 111

tf.nn.depthwise_conv2d is too slow. is it normal?

I am trying out a recent arxiv work called "Factorized CNN",

which mainly argues that spatially separated convolution (depth-wise convolution), together with channel-wise linear projection(1x1conv), can speed up the convolution operation.

this is the figure for their conv layer architecture

I found out that I can implement this architecture with tf.nn.depthwise_conv2d and 1x1 convolution, or with tf.nn.separable_conv2d.

below is my implementation:

#conv filter for depthwise convolution
depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32)))
#conv filter for linear channel projection
pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64)))
conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0))
#depthwise convolution, with multiplier 1
conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME'))
#linear channel projection with 1x1 convolution
conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b)
#residual
tensor = tf.add(tensor, conv_tensor)

This should be around 9 times faster than the original 3x3x64 -> 64 channel convolution.

However, I cannot experience any performance improvement.

I must assume that I am doing this wrong, or there's something wrong with tensorflow's implementation.

Since there is few example using depthwise_conv2d, I am leaving this question here.

Is this slow speed normal? or is there any mistake?

Upvotes: 11

Views: 3980

Answers (2)

mrgloom
mrgloom

Reputation: 21632

Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity.

https://arxiv.org/pdf/1803.09926.pdf

Upvotes: 3

Zaikun Xu
Zaikun Xu

Reputation: 1473

the current implementation of depthwise conv2d is not fully utilizing the parallel power from GPU, you need to wait for a faster implementation in the future, for example, in caffe, there exists faster third-party impl of this kernel https://github.com/yonghenglh6/DepthwiseConvolution

Upvotes: 4

Related Questions