LSTM returns a sequence of probabilities with a Sigmoid Activation

Question

I am new to LSTM's, so I tried writing a simple Sentiment Classification script in Keras. However, I am unable to make sense of the output.

Here is my Sentiment Classifier code :

    import keras
    from keras.models import Sequential
    from keras.layers import Dense, Activation, LSTM, Embedding

    from keras.callbacks import EarlyStopping, ModelCheckpoint
    es = EarlyStopping(monitor='val_loss', patience=5)
    ckpt = ModelCheckpoint('weights.hdf5', save_best_only=True, save_weights_only=True, monitor='val_accuracy')

    model = Sequential()
    model.add(Embedding(3200,128))
    model.add(LSTM(128, dropout=0.3, recurrent_dropout=0.3))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')
    model.fit(features,target, validation_split=0.2, epochs=ep, batch_size=bs, callbacks=[es, ckpt])

And here is my Sentiment Prediction Code:

def predict_on_test_sentences(model,sent):

    from keras.preprocessing.text import Tokenizer
    from keras.preprocessing import sequence

    t = Tokenizer()
    t.fit_on_texts(sent)
    test_converted = t.texts_to_sequences(sent)

    padded_text = sequence.pad_sequences(test_converted, padding='post', maxlen=32)

    assert padded_text.shape[1] == 32  

    y_prob = model.predict(padded_text)
    y_class = y_prob.argmax(axis=-1)

    print("Probabilities :
{}
Class : {}".format(y_prob, y_class))
    print(len(y_classes), len(y_prob))

My processed test sentence and model :

predict_on_test_sentences(model,"I absolutely love the food here. The service is great")

Finally, this is my output:

Probabilities :
[[0.05458272]
 [0.03890216]
 [0.01066688]
 [0.00394785]
 [0.08322579]
 [0.9882582 ]
 [0.8437737 ]
 [0.02924034]
 [0.1741887 ]
 [0.00972039]
 [0.8437737 ]
 [0.9607595 ]
 [0.03890216]
 [0.8437737 ]
 [0.9882582 ]
 [0.69985855]
 [0.00972039]
 [0.03890216]
 [0.1741887 ]
 [0.0162347 ]
 [0.00972039]
 [0.03890216]
 [0.01420724]
 [0.9882582 ]
 [0.9882582 ]
 [0.02542651]
 [0.03890216]
 [0.0162347 ]
 [0.00972039]
 [0.05820051]
 [0.00972039]
 [0.03890216]
 [0.03890216]
 [0.1741887 ]
 [0.0162347 ]
 [0.00972039]
 [0.03890216]
 [0.08322579]
 [0.00972039]
 [0.05820051]
 [0.69985855]
 [0.05458272]
 [0.92422444]
 [0.00972039]
 [0.03890216]
 [0.05458272]
 [0.08322579]
 [0.03890216]
 [0.9990741 ]
 [0.05820051]
 [0.00972039]
 [0.01066688]
 [0.17418873]]
Class : [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
53 53

Can anyone help make sense of the output? What steps do I take next so that I can classify a given review as 0/1 (Negative/Positive)? And an explanation of what I am doing wrong / what I can improve would be great too, thanks!

asachet · Accepted Answer

Sigmoid output

You're doing binary classification with your layer Dense(1, activation = "sigmoid").

With a sigmoid activation, your output is a single number between 0 and 1 which you can interpret as the probability of your first class. Observations with output close to 0 are predicted to be from the first class and those with output close to 1 from the second class.

The cut-off point need not be 0.5 (cf ROC curves) but it is a sensible value when interpreting the output as a probability since P(class2) = 1 - P(class1).

Contrary to what another answer says, there is no need to use Dense(2, activation = "softmax") for binary classification. Your approach is better.

However, you don't make predictions with argmax with a sigmoid activated output. argmax of a single value is always 0. You want to compare the probability with your cut-off point instead, typically 0.5.

For example, looking at your first 7 sentences:

[[0.05458272]
 [0.03890216]
 [0.01066688]
 [0.00394785]
 [0.08322579]
 [0.9882582 ]
 [0.8437737 ]]

The predicted classes are [0 0 0 0 0 1 1].

Of course, your problem is that you didn't ask for so many sentences, only for one. The problem is...

`texts` should be a list of sentences

XXX_texts functions expect a list of sentences. When you pass a single sentence, it is treated as a list of single characters.

Right now, you get a sequence and a prediction for each letter!

Change test_converted = t.texts_to_sequences(sent) to test_converted = t.texts_to_sequences([sent]) and you're good.

Remember, it is text*s*_to_sequence*s*, not text_to_sequence!

Tokenizer

Additionally, you need to use the training tokenizer on your test data! Otherwise, the tokens are different and you will get non-sensical results.

For example, your training tokenizer might encode "movie" as token 123 but, for your test tokenizer, token 123 could be "actor". If you don't use the same word index, the test sentences become gibberish to your model.

LSTM returns a sequence of probabilities with a Sigmoid Activation

Answers (2)

Sigmoid output

`texts` should be a list of sentences

Tokenizer

Related Questions

LSTM returns a sequence of probabilities with a Sigmoid Activation

Answers (2)

Sigmoid output

texts should be a list of sentences

Tokenizer

Related Questions

`texts` should be a list of sentences