Reputation: 179
I am working on a project based on this great tutorial. https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/
I have had to pad the end of my input and output sequences with zeros to keep them the same length, e.g.
[72 1 62 0 68 4 72 0 63 0 68 5 83 3 87 1 86 1 84 3 86 13 74 0 71 2 87 5 90 3 63 0 66 0 76 2 36 1 38 1 67 0 34 0 61 4 89 4 62 0 40 0 63 0 31 1 39 5 88 4 68 0 68 0 72 3 71 0 78 3 67 1 66 0 64 5 63 1 67 2 61 0 61 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[77 77 5 76 2 77 78 71 1 79 1 77 76 79 71 71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
However this means that during training/validations, the model will get a higher validation result than it should becuase it will easily learn to match the zero elements of the sequences.
I have added the Masking function below to the encoder_inputs & decoder_inputs, but after making the modification I get the error;
encoder_inputs = Input(shape=(None, n_in))
encoder_inputs = Masking(mask_value=0)(encoder_inputs) #****** TEST *****
....
decoder_inputs = Input(shape=(None, n_out))
decoder_inputs = Masking(mask_value=0)(decoder_inputs) #****** TEST *****
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_14:0", shape=(None, None, 80), dtype=float32) at layer "input_14". The following previous layers were accessed without issue: ['input_13']
def define_models(n_in, n_out, n_units):
# define encoder
encoder_inputs = Input(shape=(None, n_in))
encoder_inputs = Masking(mask_value=0)(encoder_inputs) #****** TEST *****
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define decoder
decoder_inputs = Input(shape=(None, n_out))
decoder_inputs = Masking(mask_value=0)(decoder_inputs) #****** TEST *****
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_out, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
Any idea how to tweak this so the new Masking functions work?
Thanks
Upvotes: 4
Views: 1047
Reputation: 22031
pay attention when you define layers in order to not override them especially the inputs
def define_models(n_in, n_out, n_units):
# define encoder
enc_inputs = Input(shape=(None, n_in))
encoder_inputs = Masking(mask_value=0)(enc_inputs) #****** TEST *****
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define decoder
dec_inputs = Input(shape=(None, n_out))
decoder_inputs = Masking(mask_value=0)(dec_inputs) #****** TEST *****
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_out, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([enc_inputs, dec_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(enc_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([dec_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
Upvotes: 1