multi-level feature fusion in tensorflow

Question

I want to know how can I combine two layers with different spatial space in Tensorflow.

for example::

batch_size = 3

input1 = tf.ones([batch_size, 32, 32, 3], tf.float32)
input2 = tf.ones([batch_size, 16, 16, 3], tf.float32)

filt1 = tf.constant(0.1, shape = [3,3,3,64])
filt1_1 = tf.constant(0.1, shape = [1,1,64,64])

filt2 = tf.constant(0.1, shape = [3,3,3,128])
filt2_2 = tf.constant(0.1, shape = [1,1,128,128])

#first layer
conv1 = tf.nn.conv2d(input1, filt1, [1,2,2,1], "SAME")
pool1 = tf.nn.max_pool(conv1, [1,2,2,1],[1,2,2,1], "SAME")

conv1_1 = tf.nn.conv2d(pool1,  filt1_1, [1,2,2,1], "SAME")
deconv1 = tf.nn.conv2d_transpose(conv1_1, filt1_1, pool1.get_shape().as_list(), [1,2,2,1], "SAME")

#seconda Layer
conv2 = tf.nn.conv2d(input2, filt2, [1,2,2,1], "SAME")
pool2 = tf.nn.max_pool(conv2, [1,2,2,1],[1,2,2,1], "SAME")

conv2_2 = tf.nn.conv2d(pool2,  filt2_2, [1,2,2,1], "SAME")
deconv2 = tf.nn.conv2d_transpose(conv2_2, filt2_2, pool2.get_shape().as_list(), [1,2,2,1], "SAME")

The deconv1 shape is [3, 8, 8, 64] and the deconv2 shape is [3, 4, 4, 128]. Here I cannot use the tf.concat to combine the deconv1 and deconv2. So how can I do this???

Edit

This is image for the architecture that I tried to implement:: it is releated to this paper::

vii. He, W., Zhang, X. Y., Yin, F., & Liu, C. L. (2017). Deep Direct Regression for Multi-Oriented Scene Text Detection. arXiv preprint arXiv:1703.08289

Ali · Accepted Answer

I checked the paper you point and there is it, consider the input image to this network has size H x W (height and width), I write the size of the output image on the side of each layer. Now look at the most bottom layer which I circle the input arrows to that layer, let's check it. This layer has two input, the first from the previous layer which has shape H/2 x W/2 and the second from the first pooling layer which also has size H/2 x W/2. These two inputs are merged together (not concatenation, but added together based on paper) and goes into the last Upsample layer, which output image of size H x W.

The other Upsample layers also have the same inputs. As you can see all merging operations have the match shapes. Also, the filter number for all merging layers is 128 which has consistency with others.

You can also use concat instead of merging, but it results in a larger filter number, be careful about that. i.e. merging two matrices with shapes H/2 x W/2 x 128 results in the same shape H/2 x W/2 x 128, but concat two matrices on the last axis, with shapes H/2 x W/2 x 128 results in H/2 x W/2 x 256.

I tried to guide you as much as possible, hope that was useful.

multi-level feature fusion in tensorflow

Answers (1)

Related Questions