user257330
user257330

Reputation: 73

In case of 2 keras models sharing layers, which model to compile after setting trainable=False?

I have 2 keras models I need to train. Lets say first model has 5 layers. Now I call the last 3 layers of the first model to be another model.

Like this:

input=Input(shape=(100,))
x1=Dense(50, activation='relu')(input)
x2=Dense(50, activation='relu')(x1)
x3=Dense(50, activation='relu')(x2)
x4=Dense(50, activation='relu')(x3)
output=Dense(10, activation='softmax')(x4)

model1=Model(inputs=input, outputs=output)
model2=Model(inputs=x3, outputs=output)

model1.compile(optimizer='rmsprop', loss='cross_entropy')
model2.compile(optimizer='rmsprop', loss='cross_entropy')

Now for some reason, I need to train the model1 on batches i.e. I can't call fit() method and do the training in 1 pass.

for epoch in range(10):
      model1.train_on_batch(x, y).

Now coming to the problem. I need to toggle the model2's training parameter inside each epoch multiple times. Think of GAN like scenario. So I need to do this inside loop

model2.trainable=False   // sometimes
model2.trainable=True    // other times

However keras says that after toggling the trainable parameter of a model, you need to re-compile the model for the changes to take effect. But I cannot understand which model to compile? The layers are shared between model1 and model2. Should compiling any of them be fine? Or I need to compile both of them.

So I mean to say that whether the following are equivalent or not?

Case 1:

model2.trainable=False
model1.compile(optimizer='rmsprop', loss='cross_entropy')

Case 2:

model2.trainable=False
model2.compile(optimizer='rmsprop', loss='cross_entropy')

Case 3:

model2.trainable=False
model1.compile(optimizer='rmsprop', loss='cross_entropy')
model2.compile(optimizer='rmsprop', loss='cross_entropy')

Upvotes: 2

Views: 1119

Answers (1)

ana
ana

Reputation: 416

You need to compile both models separately before training (otherwise you will be filling your memory for nothing): one with layers frozen, the other w/o. If you are only fitting input to output, there is no reason to compile the part with frozen layers.

Also, keras will complain if you try to define a Model with an intermediate layer as input, you would need to create two models and then put them one after the other in the pipeline:

input=Input(shape=(100,))
x1=Dense(50, activation='relu')(input)
x2=Dense(50, activation='relu')(x1)
x3=Dense(50, activation='relu')(x2)
aux_model1 = Model(inputs=input, outputs=x3)

x3_input= Input(shape=x3.shape.as_list()[1:])
x4=Dense(50, activation='relu')(x3_input)
output=Dense(10, activation='softmax')(x4)
aux_model2 = Model(inputs=x3_input, outputs=output)

x3 = aux_model1(input)
output = aux_model2(x3)
model1 = Model(inputs=input, outputs=output)        

Now compile to train w/ all trainable:

model1.compile(optimizer='rmsprop', loss='cross_entropy')

Now compile to train w/ layers in aux_model2 non trainable:

for layer in aux_model2.layers:
    layer.trainable=False
model2 = Model(inputs=input, outputs=output)

model2.compile(optimizer='rmsprop', loss='cross_entropy')

And then train either model1 or model2 depending on the condition:

for epoch in range(10):
    if training_layers:
        model1.train_on_batch(x, y)
    else:
        model2.train_on_batch(x, y)

Upvotes: 3

Related Questions