Reputation: 25
I tried to establish a simple multi-head attention layer in tensorflow1.14. Each head contains three different conv1d
layers. And I want to use tf.map_fn to compute parallel.
import tensorflow as tf
n_head = 50 # heads counts
conv1d = tf.layers.conv1d
normalize = tf.contrib.layers.instance_norm
activation = tf.nn.elu
f1d = tf.placeholder(shape=(None, 42),dtype=tf.float32) # input feats
f1ds = tf.tile(f1d[None, ...], [n_head, 1, 1]) # n_head copys to apply different attention heads
def apply_attention(f1):
f1 = activation(normalize(f1[None, ...]))
q = conv1d(f1, 32, 3, padding='same')
k = conv1d(f1, 32, 3, padding='same') # [1,ncol, 32]
v = conv1d(f1, 1, 3, padding='same') # [1,ncol, 1]
attention_map = tf.nn.softmax(tf.reduce_sum(q[0, None, :, :] * k[0, :, None, :], axis=-1) / (32 ** .5),
axis=0) # [ncol,ncol]
return attention_map * v[0]
f1d_attention = tf.map_fn(lambda x: apply_attention(x), f1ds, dtype=tf.float32)
But when I detect the variables in this model, it seems like there are only one group of conv1d
layers in the whole model.
conv1d/bias/Adam [32]
conv1d/bias/Adam_1 [32]
conv1d/kernel [3, 42, 32]
conv1d/kernel/Adam [3, 42, 32]
conv1d/kernel/Adam_1 [3, 42, 32]
conv1d_1/bias [32]
conv1d_1/bias/Adam [32]
conv1d_1/bias/Adam_1 [32]
conv1d_1/kernel [3, 42, 32]
conv1d_1/kernel/Adam [3, 42, 32]
conv1d_1/kernel/Adam_1 [3, 42, 32]
conv1d_2/bias [1]
conv1d_2/bias/Adam [1]
conv1d_2/bias/Adam_1 [1]
conv1d_2/kernel [3, 42, 1]
conv1d_2/kernel/Adam [3, 42, 1]
conv1d_2/kernel/Adam_1 [3, 42, 1]
What's wrong with my code?
Upvotes: 0
Views: 193
Reputation: 11651
In that case, you don't ant to use tf.map_fn
. tf.map_fn
will evaluate your function once, and run your different inputs through the same function, effectively using the same convolution layers for each input.
You can achieve what you want with a simple for loop :
# Creating a different set of conv for each head
multi_head = [apply_attention(f1d) for _ in range(n_head)]
# stacking the result together on the first axis
f1d_attention = tf.stack(multi_head, axis=0)
I've reduce the number of heads to 2 for visibility, but if we look at the variables, we see that 2 groups of convolution has been instantiated.
>>> tf.global_variables()
[<tf.Variable 'InstanceNorm/beta:0' shape=(42,) dtype=float32_ref>,
<tf.Variable 'InstanceNorm/gamma:0' shape=(42,) dtype=float32_ref>,
<tf.Variable 'conv1d/kernel:0' shape=(3, 42, 32) dtype=float32_ref>,
<tf.Variable 'conv1d/bias:0' shape=(32,) dtype=float32_ref>,
<tf.Variable 'conv1d_1/kernel:0' shape=(3, 42, 32) dtype=float32_ref>,
<tf.Variable 'conv1d_1/bias:0' shape=(32,) dtype=float32_ref>,
<tf.Variable 'conv1d_2/kernel:0' shape=(3, 42, 1) dtype=float32_ref>,
<tf.Variable 'conv1d_2/bias:0' shape=(1,) dtype=float32_ref>,
<tf.Variable 'InstanceNorm_1/beta:0' shape=(42,) dtype=float32_ref>,
<tf.Variable 'InstanceNorm_1/gamma:0' shape=(42,) dtype=float32_ref>,
<tf.Variable 'conv1d_3/kernel:0' shape=(3, 42, 32) dtype=float32_ref>,
<tf.Variable 'conv1d_3/bias:0' shape=(32,) dtype=float32_ref>,
<tf.Variable 'conv1d_4/kernel:0' shape=(3, 42, 32) dtype=float32_ref>,
<tf.Variable 'conv1d_4/bias:0' shape=(32,) dtype=float32_ref>,
<tf.Variable 'conv1d_5/kernel:0' shape=(3, 42, 1) dtype=float32_ref>,
<tf.Variable 'conv1d_5/bias:0' shape=(1,) dtype=float32_ref>]
Side Note: Unless you have a really good reason, you should migrate away from TensorFlow 1 and use TensorFlow 2 instead. Support for TF1 is limited.
Upvotes: 1