Reputation: 24121
I want to implement a Siamese Convolutional Neural Network, where two images share weights in the convolutional layers, and are then concatenated before being passed through the fully-connected layers. I have tried an implementation, but it seems rather a "hacked" solution. In particular, I have defined an operation on tensors as simply a Python function, and I'm not sure whether this is allowed.
Here is the code I have tried:
images = tf.placeholder(tf.float32, shape=[None, 64 * 64])
# Convolutional layers
# ...
# ...
# Results in pool3_flat, which is the flattened output of the third convolutional layer
pool3_flat = tf.reshape(pool3, [-1, 8 * 8 * 128])
# Now, merge the image pairs, where each pair is composed of adjacent images in the batch, with a stride of 2
def merge_pairs():
# Create a tensor to store the merged image pairs
# The batch size is 128, therefore there will be 64 pairs (64 in the first dimension of this tensor)
merged_pairs = tf.Variable(tf.zeros([64, 8 * 8 * 128]))
# Split the images into 64 pairs
pairs = tf.split(0, 64, pool3_flat)
# For each pair, concatenate the two images across dimension 1, and set this tensor in the appropriate row of merged_pairs
for pair_num, pair in enumerate(pairs):
merged_pair = tf.concat(1, pair)
merged_pairs[pair_num] = merged_pair
return merged_pairs
# Proceed with operations on the merged_pair tensor, as if the batch size is 64
fc4 = tf.matmul(merge_pairs(), weights4)
# ...
# ...
Whilst this compiles and seems to run fine, the results are not really as expected. So, I'm wondering if there is a better way to implement a Siamese network using built-in operations in TensorFlow?
Upvotes: 4
Views: 3271
Reputation: 66
You can make use of tf.pack and tf.unpack, somewhat like:
pairs = tf.pack(tf.split(0, 64, pool3_flat))
left, right = tf.unpack(tf.transpose(pairs, perm=[1,0,2]))
merged_pairs = tf.concat(1, [left, right])
A cleaner way to do this is to keep your pairs separate from the beginning, so that you can define two networks and use the same trainable variables in each network.
You would have something like (skipping the convolutional layers):
image_left = tf.placeholder(tf.float32, shape=[None, 64, 64, 1])
image_right = tf.placeholder(tf.float32, shape=[None, 64, 64, 1])
pool_left = tf.nn.max_pool(image_left, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool_right = tf.nn.max_pool(image_left, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool_flat_left = tf.reshape(pool_left, [-1, 32*32])
pool_flat_right = tf.reshape(pool_right, [-1, 32*32])
Then simply concat left and right in dimension 1.
concat_layer = tf.concat(1, [pool_flat_left, pool_flat_right])
This way you can also vary the batch size later. Make sure to use the same weights and biases on each size (left and right).
Upvotes: 5