thinkdeep
thinkdeep

Reputation: 1033

Why doesn't Tensorflow Bidirectional LSTM Match Forward and Backward LSTM?

In Tensorflow 2.x, the default merge_mode in Bidirectional layer is concat, as shown below.

tf.keras.layers.Bidirectional(
    layer, merge_mode='concat', weights=None, backward_layer=None, **kwargs
)

But, why fb_out is not a concatenation of f_out and b_out as shown in below test code?

>>> import copy
>>> inputs = tf.random.normal([1, 5, 10])
>>> forward_layer = LSTM(1, return_sequences=True)
>>>
>>> backward_layer = LSTM(1, return_sequences=True, go_backwards=True)
>>>
>>> f_copy = copy.deepcopy(forward_layer)
>>>
>>> b_copy = copy.deepcopy(backward_layer)
>>>
>>> fb = Bidirectional(forward_layer, backward_layer=backward_layer)
>>>
>>> f_out = f_copy(inputs)
>>> b_out = b_copy(inputs)
>>>
>>> fb_out = fb(inputs)
>>> f_out
<tf.Tensor: shape=(1, 5, 1), dtype=float32, numpy=
array([[[ 0.11658007],
        [-0.0704283 ],
        [-0.17762654],
        [ 0.0304627 ],
        [-0.19515464]]], dtype=float32)>
>>> b_out
<tf.Tensor: shape=(1, 5, 1), dtype=float32, numpy=
array([[[-0.18902111],
        [-0.00259904],
        [ 0.23515013],
        [ 0.22268802],
        [ 0.4035125 ]]], dtype=float32)>
>>> fb_out
<tf.Tensor: shape=(1, 5, 2), dtype=float32, numpy=
array([[[ 0.21822408,  0.07384206],
        [ 0.0036808 , -0.0700341 ],
        [-0.11105614, -0.38493848],
        [-0.13826807, -0.12408008],
        [ 0.05806111, -0.05853282]]], dtype=float32)>

Upvotes: 1

Views: 643

Answers (2)

yzhong
yzhong

Reputation: 1

It seems like to be late for this response, but I hope it would help anyone else...

Okay, Bidirectional layer in keras does behavior like merging (sum/concat etc.) two regular LSTMs, but it would not take care of initializing weights, which is why you have different outputs.

For example, you have a simple input with 3 timestamps and 2 features for each step.

import tensorflow as tf

unit = 1
dim = 2
timestamp = 3
inputs = tf.random.normal([1, timestamp, dim])

Let's have three layers: forward LSTM, backward LSTM and Bidirectional one.

forward_layer = tf.keras.layers.LSTM(unit, return_sequences=True)
backward_layer = tf.keras.layers.LSTM(unit, return_sequences=True, 
                                      go_backwards=True)
fb = tf.keras.layers.Bidirectional(forward_layer, backward_layer=backward_layer)

forward_layer.build((None, timestamp, dim))
backward_layer.build((None, timestamp, dim))
fb.build((None, timestamp, dim))

Just check initialized weights and you'll see Bidirectional layer initialized a new set of weights for forward part, but used the same sets for backward part. You can finally get the same results if you reset weights accordingly.

a, b, c = forward_layer.get_weights()
a1, b1, c1 = backward_layer.get_weights()
a2, b2, c2 , a3, b3, c3 = fb.get_weights()

fb.set_weights([a, b, c, a1, b1, c1])

forward_layer(inputs)
array([[[ 0.0342516 ],
    [ 0.0213093 ],
    [-0.06462004]]], dtype=float32)>

backward_layer(inputs)
array([[[-0.08782256],
    [-0.16806953],
    [-0.17708375]]], dtype=float32)>

fb(inputs)
array([[[ 0.0342516 , -0.17708375],
    [ 0.0213093 , -0.16806953],
    [-0.06462004, -0.08782256]]], dtype=float32)>

Upvotes: 0

dtlam26
dtlam26

Reputation: 1600

The principle of BiDirectional is not as simple as you take the sum of forward and backward. The output of BiLSTM will be processed on both directions and the combination of them will be decided by tanh and sigmoid gates of LSTM. Therefore, if you divide it into 2 processes, the result can't be the same. One is for weight learning on raw bidirectional input, one is for 2 separated layers.

BiDirectional Structures

Upvotes: 2

Related Questions