Reputation: 2554
I am fairly new to Keras and DNN in general and starting from some tutorials, I have managed to create a model for classifying sentences. The model is shown below. To be honest, I do not know for sure what is the intuition behind it and why it works. So this is my question.
def create_model():
embedding_layer = Embedding(input_dim=100, output_dim=300,
input_length=100)
model = Sequential()
model.add(embedding_layer)
model.add(Dropout(0.2))
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=4))
model.add(LSTM(units=100, return_sequences=True))
model.add(GlobalMaxPooling1D())
#model.add(Dense(1, activation='sigmoid'))
###### multiclassification #########
model.add(Dense(3, activation='sigmoid')) #I want to replace the above line with this for multi-classification but this didnt work
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
And here is my understanding: The model starts with training word embeddings on the corpus (of sentences), and represent each sentence as a vector of word vectors (embedding_layer). The dropout layer then forces the model to not rely on specific words. Convolution has a similar effect of identifying phrases/n-grams as opposed to just individual words; then an LSTM follows to learn sequences of phrases/n-grams that may be useful features; The Globalmaxpooling1D
layer then 'flattens' the LSTM output as features for the final classification (dense layer).
Does this make any sense? I also do not quite understand the interaction between the maxpooling1D
layer and the lstm
layer. What's the input_shape
to lstm
and what does the output look like?
Upvotes: 2
Views: 511
Reputation: 86600
Multiclass models:
The multiclassification model ending with Dense(3,activation='sigmoid')
is ok for a multiclass with 3 possible classes.
But it should only use 'categorical_crossentropy'
if there is only one correct class among the 3. In this case, the activation function should be 'softmax'
.
A 'softmax' will guarantee that all the classes sum 1. It's good when you want only one correct class.
A 'sigmoid' will not care about the relation between the 3 classes, they can coexist as all ones or all zeros. In this case, use a 'binary_crossentropy'
.
LSTM and GlobalMaxPooling:
The LSTM input is (batchSize, timeSteps, featuresOrDimension)
.
The output can be two:
return_sequences = True
: (batchSize, timeSteps, units)
return_sequences = False
: (batchSize, units)
. Since you chose the True case, there is the timeSteps dimension, and the GlobalMaxPooling1D
will take the highest value in that dimension and discard the others, resulting in (batchSize,units)
.
It's pretty much like using only LSTM(units,return_sequences=False)
. But this one takes the last step in the sequence, while the maxpooling will take the maximum step.
Upvotes: 0
Reputation: 40516
So, your intuition is right. Everything you told holds. About MaxPooling1D
- it's a way to downsample the output from Conv1D
. The output from this layer will be 4-times smaller than the original output from Conv1D
(so input to LSTM
will have a length of 25 with the same number of features. Just to show you how it works:
output from Conv1D
:
0, 1, 1, 0, -1, 2, 3, 5, 1, 2, 1, -1
input to LSTM
:
1 (max from 0, 1, 1, 0), 5 (max from -1, 2, 3, 5), 2 (max from 1, 2, 1, -1)
Edit
I haven't noticed categorical_crossentropy
and activations. So:
If your output is one out of 3 classes you could use categorical_crossentropy
and sigmoid
but then your input cannot be interpretable as probability distribution but as class score (prediction is equal to a class with a highest score). Better option is to use softmax
which produces a probability distribution over classes.
In case of 3 classes prediction (not mutually exclusive) due to Keras implementation you should use binary_crossentropy
even though it's mathematically equivalent to categorical_crossentropy
. It's because keras
normalizes outputs from the last layer and makes them to sum up to 1. This might seriously harm your training.
Upvotes: 2