How to use different convolution layers in different branches of tf.map_fn?

Question

I tried to establish a simple multi-head attention layer in tensorflow1.14. Each head contains three different conv1d layers. And I want to use tf.map_fn to compute parallel.

import tensorflow as tf

n_head = 50 # heads counts

conv1d = tf.layers.conv1d
normalize = tf.contrib.layers.instance_norm 
activation = tf.nn.elu

f1d = tf.placeholder(shape=(None, 42),dtype=tf.float32) # input feats

f1ds = tf.tile(f1d[None, ...], [n_head, 1, 1]) # n_head copys to apply different attention heads

def apply_attention(f1):
    f1 = activation(normalize(f1[None, ...]))
    q = conv1d(f1, 32, 3, padding='same')
    k = conv1d(f1, 32, 3, padding='same')  # [1,ncol, 32]
    v = conv1d(f1, 1, 3, padding='same')  # [1,ncol, 1]
    attention_map = tf.nn.softmax(tf.reduce_sum(q[0, None, :, :] * k[0, :, None, :], axis=-1) / (32 ** .5),
                                  axis=0)  # [ncol,ncol]
    return attention_map * v[0]

f1d_attention = tf.map_fn(lambda x: apply_attention(x), f1ds, dtype=tf.float32)

But when I detect the variables in this model, it seems like there are only one group of conv1d layers in the whole model.

conv1d/bias/Adam [32]
conv1d/bias/Adam_1 [32]
conv1d/kernel [3, 42, 32]
conv1d/kernel/Adam [3, 42, 32]
conv1d/kernel/Adam_1 [3, 42, 32]
conv1d_1/bias [32]
conv1d_1/bias/Adam [32]
conv1d_1/bias/Adam_1 [32]
conv1d_1/kernel [3, 42, 32]
conv1d_1/kernel/Adam [3, 42, 32]
conv1d_1/kernel/Adam_1 [3, 42, 32]
conv1d_2/bias [1]
conv1d_2/bias/Adam [1]
conv1d_2/bias/Adam_1 [1]
conv1d_2/kernel [3, 42, 1]
conv1d_2/kernel/Adam [3, 42, 1]
conv1d_2/kernel/Adam_1 [3, 42, 1]

What's wrong with my code?

Lescurel · Accepted Answer

In that case, you don't ant to use tf.map_fn. tf.map_fn will evaluate your function once, and run your different inputs through the same function, effectively using the same convolution layers for each input.

You can achieve what you want with a simple for loop :

# Creating a different set of conv for each head
multi_head = [apply_attention(f1d) for _ in range(n_head)]
# stacking the result together on the first axis
f1d_attention = tf.stack(multi_head, axis=0)

I've reduce the number of heads to 2 for visibility, but if we look at the variables, we see that 2 groups of convolution has been instantiated.

>>> tf.global_variables()
[,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ,
 ]

Side Note: Unless you have a really good reason, you should migrate away from TensorFlow 1 and use TensorFlow 2 instead. Support for TF1 is limited.

How to use different convolution layers in different branches of tf.map_fn?

Answers (1)

Related Questions