Reputation: 240394
I'm using Keras for a (character) sequence-to-sequence RNN application. Since I have a relatively small set of examples of A -> B, and a much larger set of examples of B, I decided to try an autoencoder approach: first train a network to learn the identity function on B, producing an embedding for members of B, then train a network to learn A -> embedding(B). By combining the second network with the decoder-half of the first network, hopefully it will generalize to produce plausible Bs.
The code, modeled after the Building Autoencoders in Keras tutorial, looks something like this (several layers, dropouts, regularizations, etc. have been left out for the sake of simplicity):
class Example:
def __init(self, ...):
# Sets dense_size, rnn, rnn_size, embed_size, input_len, output_len, etc.
def apply_encoder(self, input_word):
input_masked = Masking()(input_word)
input_dense = TimeDistributed(Dense(self.dense_size), name='input_dense')(input_masked)
rnn = self.rnn(self.rnn_size, name='input_rnn')(input_dense)
embedding = Dense(self.embed_size, name='embedding')(rnn)
return embedding
def apply_decoder(self, embedding):
repeated = RepeatVector(self.output_len, name='repeat')(embedding)
rnn = self.rnn(self.rnn_size, name='output_rnn')(repeated)
output_dense = TimeDistributed(Dense(self.dense_size), name='output_dense')(rnn)
output_word = TimeDistributed(
Dense(self.chars, activation='softmax'),
name='output_word'
)(output_dense)
return output_word
def build_net(self):
input_word = Input(shape=(self.input_len, self.chars), name='input_word')
embedding = self.apply_encoder(input_word)
output_word = self.apply_decoder(embedding)
self.autoencoder = Model(input_word, output_word)
self.encoder = Model(input_word, embedding)
embed_input = Input(shape=(self.embed_size,), name='input_embedding')
decoder_output = self.apply_decoder(embed_input)
self.decoder = Model(embed_input, decoder_output)
def save_models(self):
open('models/autoencoder.json', 'w').write(self.autoencoder.to_json())
open('models/encoder.json', 'w').write(self.encoder.to_json())
open('models/decoder.json', 'w').write(self.decoder.to_json())
First one script trains autoencoder
on B -> B; then another script instantiates encoder
twice and trains encoderA
on A -> encoderB.predict(B)
; finally, the query script uses encoderA
and decoderB
to make predictions.
This all works just fine, but the performance isn't as good as I would like, so what I would really like to do is to train both models in tandem. What I want is two autoencoder models with separate encoder halves, but shared weights for the decoder halves. Then I alternate between training model A on a batch of A -> B, and training model B on a batch of B -> B, which should update the two encoders on alternate batches, but update the shared decoder on every batch.
My question is, simply, how can I construct these two models so that the weights are shared in the way that I want? Here is a similar question, but it only explains how to do what I've already done. In case backend matters (it probably doesn't) I can use TF or Theano.
Upvotes: 4
Views: 2226
Reputation: 86620
Make part models using the functional API, and join them as if they were layers.
The difference is that you are creating two decoders (by calling apply decoder twice)
Encoder A:
aInput = Input(...)
encodedA = LotsOfLayers(...)(aInput)
self.encoderA = Model(aInput,encodedA)
Encoder B:
bInput = Input(...)
encodedB = LotsOfLayers(...)(bInput)
self.encoderB = Model(bInput,encodedB)
Decoder:
We create just one decoder here:
encodedInput = Input(...)
decodedOutput = LotsOfLayers(...)(encodedInput)
self.decoder = Model(encodedInput, decodedOutput)
AutoencoderB:
Here's the "jump of the cat":
autoInput = Input(sameShapeAsEncoderBInput)
encoded = self.encoderB(autoInput)
decoded = self.decoder(encoded)
self.autoencoderB = Model(autoInput,decoded)
Predictor from A:
Follow the same logic:
anotherAInput = Input(sameShapeAsEncoderAInput)
encoded = self.encoderA(anotherAInput)
decoded = self.decoder(encoded)
self.predictorFromA = Model(anotherAInput,decoded)
This will make the decoder be the same (sharing weights) for both AutoencoderB and Predictor from A.
Upvotes: 2