hobbs
hobbs

Reputation: 240394

Training two Keras models in tandem with partially shared weights

I'm using Keras for a (character) sequence-to-sequence RNN application. Since I have a relatively small set of examples of A -> B, and a much larger set of examples of B, I decided to try an autoencoder approach: first train a network to learn the identity function on B, producing an embedding for members of B, then train a network to learn A -> embedding(B). By combining the second network with the decoder-half of the first network, hopefully it will generalize to produce plausible Bs.

The code, modeled after the Building Autoencoders in Keras tutorial, looks something like this (several layers, dropouts, regularizations, etc. have been left out for the sake of simplicity):

class Example:
    def __init(self, ...):
        # Sets dense_size, rnn, rnn_size, embed_size, input_len, output_len, etc.

    def apply_encoder(self, input_word):
        input_masked = Masking()(input_word)
        input_dense = TimeDistributed(Dense(self.dense_size), name='input_dense')(input_masked)
        rnn = self.rnn(self.rnn_size, name='input_rnn')(input_dense)
        embedding = Dense(self.embed_size, name='embedding')(rnn)
        return embedding

    def apply_decoder(self, embedding):
        repeated = RepeatVector(self.output_len, name='repeat')(embedding)
        rnn = self.rnn(self.rnn_size, name='output_rnn')(repeated)
        output_dense = TimeDistributed(Dense(self.dense_size), name='output_dense')(rnn)
        output_word = TimeDistributed(
            Dense(self.chars, activation='softmax'),
            name='output_word'
            )(output_dense)
        return output_word

    def build_net(self):
        input_word = Input(shape=(self.input_len, self.chars), name='input_word')
        embedding = self.apply_encoder(input_word)
        output_word = self.apply_decoder(embedding)

        self.autoencoder = Model(input_word, output_word)
        self.encoder = Model(input_word, embedding)

        embed_input = Input(shape=(self.embed_size,), name='input_embedding')
        decoder_output = self.apply_decoder(embed_input)

        self.decoder = Model(embed_input, decoder_output)

    def save_models(self):
        open('models/autoencoder.json', 'w').write(self.autoencoder.to_json())
        open('models/encoder.json', 'w').write(self.encoder.to_json())
        open('models/decoder.json', 'w').write(self.decoder.to_json())

First one script trains autoencoder on B -> B; then another script instantiates encoder twice and trains encoderA on A -> encoderB.predict(B); finally, the query script uses encoderA and decoderB to make predictions.

This all works just fine, but the performance isn't as good as I would like, so what I would really like to do is to train both models in tandem. What I want is two autoencoder models with separate encoder halves, but shared weights for the decoder halves. Then I alternate between training model A on a batch of A -> B, and training model B on a batch of B -> B, which should update the two encoders on alternate batches, but update the shared decoder on every batch.

My question is, simply, how can I construct these two models so that the weights are shared in the way that I want? Here is a similar question, but it only explains how to do what I've already done. In case backend matters (it probably doesn't) I can use TF or Theano.

Upvotes: 4

Views: 2226

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86620

Make part models using the functional API, and join them as if they were layers.

The difference is that you are creating two decoders (by calling apply decoder twice)

Encoder A:

aInput = Input(...)
encodedA = LotsOfLayers(...)(aInput)

self.encoderA = Model(aInput,encodedA)

Encoder B:

bInput = Input(...)
encodedB = LotsOfLayers(...)(bInput)

self.encoderB = Model(bInput,encodedB)

Decoder:

We create just one decoder here:

encodedInput = Input(...)
decodedOutput = LotsOfLayers(...)(encodedInput)

self.decoder = Model(encodedInput, decodedOutput)

AutoencoderB:

Here's the "jump of the cat":

autoInput = Input(sameShapeAsEncoderBInput)
encoded = self.encoderB(autoInput)
decoded = self.decoder(encoded)

self.autoencoderB = Model(autoInput,decoded)

Predictor from A:

Follow the same logic:

anotherAInput = Input(sameShapeAsEncoderAInput)
encoded = self.encoderA(anotherAInput)
decoded = self.decoder(encoded)

self.predictorFromA = Model(anotherAInput,decoded)

This will make the decoder be the same (sharing weights) for both AutoencoderB and Predictor from A.

Upvotes: 2

Related Questions