Reputation: 1033
In Tensorflow 2.x, the default merge_mode in Bidirectional layer is concat
, as shown below.
tf.keras.layers.Bidirectional(
layer, merge_mode='concat', weights=None, backward_layer=None, **kwargs
)
But, why fb_out
is not a concatenation of f_out
and b_out
as shown in below test code?
>>> import copy
>>> inputs = tf.random.normal([1, 5, 10])
>>> forward_layer = LSTM(1, return_sequences=True)
>>>
>>> backward_layer = LSTM(1, return_sequences=True, go_backwards=True)
>>>
>>> f_copy = copy.deepcopy(forward_layer)
>>>
>>> b_copy = copy.deepcopy(backward_layer)
>>>
>>> fb = Bidirectional(forward_layer, backward_layer=backward_layer)
>>>
>>> f_out = f_copy(inputs)
>>> b_out = b_copy(inputs)
>>>
>>> fb_out = fb(inputs)
>>> f_out
<tf.Tensor: shape=(1, 5, 1), dtype=float32, numpy=
array([[[ 0.11658007],
[-0.0704283 ],
[-0.17762654],
[ 0.0304627 ],
[-0.19515464]]], dtype=float32)>
>>> b_out
<tf.Tensor: shape=(1, 5, 1), dtype=float32, numpy=
array([[[-0.18902111],
[-0.00259904],
[ 0.23515013],
[ 0.22268802],
[ 0.4035125 ]]], dtype=float32)>
>>> fb_out
<tf.Tensor: shape=(1, 5, 2), dtype=float32, numpy=
array([[[ 0.21822408, 0.07384206],
[ 0.0036808 , -0.0700341 ],
[-0.11105614, -0.38493848],
[-0.13826807, -0.12408008],
[ 0.05806111, -0.05853282]]], dtype=float32)>
Upvotes: 1
Views: 643
Reputation: 1
It seems like to be late for this response, but I hope it would help anyone else...
Okay, Bidirectional layer in keras does behavior like merging (sum/concat etc.) two regular LSTMs, but it would not take care of initializing weights, which is why you have different outputs.
For example, you have a simple input with 3 timestamps and 2 features for each step.
import tensorflow as tf
unit = 1
dim = 2
timestamp = 3
inputs = tf.random.normal([1, timestamp, dim])
Let's have three layers: forward LSTM, backward LSTM and Bidirectional one.
forward_layer = tf.keras.layers.LSTM(unit, return_sequences=True)
backward_layer = tf.keras.layers.LSTM(unit, return_sequences=True,
go_backwards=True)
fb = tf.keras.layers.Bidirectional(forward_layer, backward_layer=backward_layer)
forward_layer.build((None, timestamp, dim))
backward_layer.build((None, timestamp, dim))
fb.build((None, timestamp, dim))
Just check initialized weights and you'll see Bidirectional layer initialized a new set of weights for forward part, but used the same sets for backward part. You can finally get the same results if you reset weights accordingly.
a, b, c = forward_layer.get_weights()
a1, b1, c1 = backward_layer.get_weights()
a2, b2, c2 , a3, b3, c3 = fb.get_weights()
fb.set_weights([a, b, c, a1, b1, c1])
forward_layer(inputs)
array([[[ 0.0342516 ],
[ 0.0213093 ],
[-0.06462004]]], dtype=float32)>
backward_layer(inputs)
array([[[-0.08782256],
[-0.16806953],
[-0.17708375]]], dtype=float32)>
fb(inputs)
array([[[ 0.0342516 , -0.17708375],
[ 0.0213093 , -0.16806953],
[-0.06462004, -0.08782256]]], dtype=float32)>
Upvotes: 0
Reputation: 1600
The principle of BiDirectional is not as simple as you take the sum of forward and backward. The output of BiLSTM will be processed on both directions and the combination of them will be decided by tanh and sigmoid gates of LSTM. Therefore, if you divide it into 2 processes, the result can't be the same. One is for weight learning on raw bidirectional input, one is for 2 separated layers.
Upvotes: 2