Janina Nuber
Janina Nuber

Reputation: 317

Inner workings of Keras LSTM

I am working on a multi-class classification task: the goal is to identify what is the correct language of origin of a certain surname. For this, I am using a Keras LSTM. So far, I have only worked with PyTorch and I am very surprised by the "black box" character of Keras. For this classification task, my understanding is that I need to retrieve the output of the last time step for a given input sequence in the LSTM and then apply the softmax on it to get the probability distribution over all classes. Interestingly, without me explicitly defining to do so, the LSTM seems to automatically do the right thing and chooses the last time step's output and not e.g. the hidden state to apply the softmax on (good training & validation results so far). How is that possible? Does the choice of the appropriate loss function categorical_crossentropy indicate to the model that is should use the last time step's output to do the classification?

Code:

model = Sequential()

model.add(Dense(100, input_shape=(max_len, len(alphabet)), kernel_regularizer=regularizers.l2(0.00001)))

model.add(Dropout(0.85))

model.add(LSTM(100, input_shape=(100,))) 

model.add(Dropout(0.85))

model.add(Dense(num_output_classes, activation='softmax'))

adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=1e-6)

model.compile(loss='categorical_crossentropy',
              optimizer=adam,
              metrics=['accuracy'])

history = model.fit(train_data, train_labels,
          epochs=5000,
          batch_size=num_train_examples,
          validation_data = (valid_data, valid_labels))

Upvotes: 3

Views: 175

Answers (1)

hobbs
hobbs

Reputation: 239930

No, returning the last time step's output is just what every Keras RNN layer does by default. See the documentation for return_sequences, which causes it to return every time step's output instead (which is necessary for stacking RNN layers). There's no automatic intuition based on what kinds of layers you're hooking together, you just got what you wanted by default, presumably because the designers figured that to be the most common case.

Upvotes: 3

Related Questions