Reputation: 73
I have a Keras Sequential model consisting of some Dense Layers. I set the trainable property of the whole model to False. But I see that the individual layers have still their trainable property set to True. Do I need to individually set the layers' trainable property also to False? Then what is the meaning of setting trainable property to False on the whole model?
Upvotes: 3
Views: 1535
Reputation: 33450
To be able to answer this you need to take a look at the source code of Keras, which you might be surprised after doing so because you would realize that:
Sequential
class is a subclass of Model
class, andModel
class is a subclass of Network
class, andNetwork
class is a subclass of Layer
class!As I said, this might be a bit surprising that a Keras model is derived from a Keras layer. But if you think further, you would find it reasonable since they have a lot of common functionalities (e.g. both get some inputs, do some computations on them, produce some output, and update their internal weights/parameters). One of their common attributes is trainable
attribute. Now when you set the trainable
property of a model as False
it would skip the weight update step. In other words, it does not check the trainable
attribute of its underlying layers; rather, first it checks its own trainable
attribute (more precisely in Network
class) and if it is False
the updates are skipped. Therefore, that does not mean its underlying layers have their trainable
attribute set to False
as well. And there is a good reason for not doing that: a single instance of a layer could be used in multiple models. For example, consider the following two models which have a shared layer:
inp = Input(shape=...)
shared_layer = Dense(...)
sout = shared_layer(inp)
m1_out = Dense(...)(sout)
m2_out = Dense(...)(sout)
model1 = Model(inp, m1_out)
model2 = Model(inp, m2_out)
Now if we set model1.trainable = False
, this would freezes the whole model1
(i.e. training model1
does not update the weights of its underlying layers including shared_layer
); however, the shared_layer
and the model2
are still trainable (i.e. training model2
would update the weights of all its layers including shared_layer
). On the other hand, if we set model1.layers[1].trainable = False
, then the shared_layer
is freezed and therefore its weights would not be updated when training either model1
or model2
. This way you could have much more control and flexibility, and therefore you can build more complex architectures (e.g. GANs).
Upvotes: 10