R Nanthak
R Nanthak

Reputation: 363

Random Initialisation of Hidden State of LSTM in keras

I used a model for my music generation project. The model is created as follows

        self.model.add(LSTM(self.hidden_size, input_shape=(self.input_length,self.notes_classes),return_sequences=True,recurrent_dropout=dropout) ,)
        self.model.add(LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True))
        self.model.add(LSTM(self.hidden_size,return_sequences=True))
        self.model.add(BatchNorm())
        self.model.add(Dropout(dropout))
        self.model.add(Dense(256))
        self.model.add(Activation('relu'))
        self.model.add(BatchNorm())
        self.model.add(Dropout(dropout))
        self.model.add(Dense(256))
        self.model.add(Activation('relu'))
        self.model.add(BatchNorm())
        self.model.add(Dense(self.notes_classes))
        self.model.add(Activation('softmax'))

After Training this model with 70% accuracy, Whenever I generate music, it always gives same kind of starting notes with little variation for whatever the input notes. I think it is possible to solve this condition by initialising the hidden state of the LSTM, at the start of the generation. How can I do that?

Upvotes: 1

Views: 308

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86650

There are two states, the state_h which is the last step output; and the state_c which is the carry on state or memory.

You should use a functional API model to have more than one input:

main_input = Input((self.input_length,self.notes_classes))
state_h_input = Input((self.hidden_size,))
state_c_input = Input((self.hidden_size, self.hidden_size))

out = LSTM(self.hidden_size, return_sequences=True,recurrent_dropout=dropout,
           initial_state=[state_h_input, state_c_input])(main_input)

#I'm not changing the following layers, they should have their own states if you want to

out = LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True)(out)
out = LSTM(self.hidden_size,return_sequences=True)(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dense(self.notes_classes)(out)
out = Activation('softmax')(out)

self.model = Model([main_input, state_h_input, state_c_input], out)

Following this approach, it's even possible to use outputs of other layers as initial states, if you want trainable initial states.

The big change is that you will need to pass the states for training and predicting:

model.fit([original_inputs, state_h_data, state_c_data], y_train) 

Where you can use zeros for the states during training.

Upvotes: 1

Related Questions