Eka
Eka

Reputation: 15002

What is reason behind different keras layers initialization?

I have seen these type of layer initialization in keras

from keras.models import Model
from keras.layers import Input, Dense

a = Input(shape=(32,))
b = Dense(32)(a)
c = Dense(b)

Its the initialization of c_th layer which is confusing. I have a class object like this

class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, features, hidden):
        hidden_with_time_axis = tf.expand_dims(hidden, 1)
        score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
        attention_weights = tf.nn.softmax(self.V(score), axis=1)
        context_vector = attention_weights * features
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights

Look at self.W1(features) its taking the previous layer's feature and passing it to an already initialized weight W1 dense layer with x units . What is happening in this step and why we are doing it?

EDIT:

class Foo:
    def __init__(self, units):
        self.units=units
    def __call__(self):
        print ('called '+self.units)


a=Foo(3)
b=Foo(a)

why we need to call a function?

Upvotes: 0

Views: 114

Answers (1)

xdurch0
xdurch0

Reputation: 10474

There is a difference between initalizing and calling a layer.

b = Dense(32)(a) intializes a dense layer with 32 hidden units and then immediately calls this layer on the input a. For this you need to be aware of the concept of callable objects in Python; basically any object that has a __call__ function defined (which the keras base Layer class does) can be called on an input, i.e. used like a function.

c = Dense(b) most certainly won't work, and if you have really seen this in a tutorial or piece of code somewhere, I would avoid that source in the future... This would attempt to create a layer with b units which makes no sense if b is the output of another dense layer. Most likely, whatever you saw was actually something like c = Dense(n_units)(b).

That being said, all that happens in the Attention piece of code is that the layer self.W1 is called on features (same for W2) after it had previously been initialized in __init__.

Upvotes: 1

Related Questions