Reputation: 95
I am new to LSTM's, so I tried writing a simple Sentiment Classification script in Keras. However, I am unable to make sense of the output.
Here is my Sentiment Classifier code :
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, LSTM, Embedding
from keras.callbacks import EarlyStopping, ModelCheckpoint
es = EarlyStopping(monitor='val_loss', patience=5)
ckpt = ModelCheckpoint('weights.hdf5', save_best_only=True, save_weights_only=True, monitor='val_accuracy')
model = Sequential()
model.add(Embedding(3200,128))
model.add(LSTM(128, dropout=0.3, recurrent_dropout=0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')
model.fit(features,target, validation_split=0.2, epochs=ep, batch_size=bs, callbacks=[es, ckpt])
And here is my Sentiment Prediction Code:
def predict_on_test_sentences(model,sent):
from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence
t = Tokenizer()
t.fit_on_texts(sent)
test_converted = t.texts_to_sequences(sent)
padded_text = sequence.pad_sequences(test_converted, padding='post', maxlen=32)
assert padded_text.shape[1] == 32
y_prob = model.predict(padded_text)
y_class = y_prob.argmax(axis=-1)
print("Probabilities :\n{}\nClass : {}".format(y_prob, y_class))
print(len(y_classes), len(y_prob))
My processed test sentence and model :
predict_on_test_sentences(model,"I absolutely love the food here. The service is great")
Finally, this is my output:
Probabilities :
[[0.05458272]
[0.03890216]
[0.01066688]
[0.00394785]
[0.08322579]
[0.9882582 ]
[0.8437737 ]
[0.02924034]
[0.1741887 ]
[0.00972039]
[0.8437737 ]
[0.9607595 ]
[0.03890216]
[0.8437737 ]
[0.9882582 ]
[0.69985855]
[0.00972039]
[0.03890216]
[0.1741887 ]
[0.0162347 ]
[0.00972039]
[0.03890216]
[0.01420724]
[0.9882582 ]
[0.9882582 ]
[0.02542651]
[0.03890216]
[0.0162347 ]
[0.00972039]
[0.05820051]
[0.00972039]
[0.03890216]
[0.03890216]
[0.1741887 ]
[0.0162347 ]
[0.00972039]
[0.03890216]
[0.08322579]
[0.00972039]
[0.05820051]
[0.69985855]
[0.05458272]
[0.92422444]
[0.00972039]
[0.03890216]
[0.05458272]
[0.08322579]
[0.03890216]
[0.9990741 ]
[0.05820051]
[0.00972039]
[0.01066688]
[0.17418873]]
Class : [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
53 53
Can anyone help make sense of the output? What steps do I take next so that I can classify a given review as 0/1 (Negative/Positive)? And an explanation of what I am doing wrong / what I can improve would be great too, thanks!
Upvotes: 3
Views: 2882
Reputation: 6921
You're doing binary classification with your layer Dense(1, activation = "sigmoid")
.
With a sigmoid activation, your output is a single number between 0 and 1 which you can interpret as the probability of your first class. Observations with output close to 0 are predicted to be from the first class and those with output close to 1 from the second class.
The cut-off point need not be 0.5 (cf ROC curves) but it is a sensible value when interpreting the output as a probability since P(class2) = 1 - P(class1)
.
Contrary to what another answer says, there is no need to use Dense(2, activation = "softmax")
for binary classification. Your approach is better.
However, you don't make predictions with argmax with a sigmoid activated output. argmax
of a single value is always 0. You want to compare the probability with your cut-off point instead, typically 0.5
.
For example, looking at your first 7 sentences:
[[0.05458272]
[0.03890216]
[0.01066688]
[0.00394785]
[0.08322579]
[0.9882582 ]
[0.8437737 ]]
The predicted classes are [0 0 0 0 0 1 1]
.
Of course, your problem is that you didn't ask for so many sentences, only for one. The problem is...
texts
should be a list of sentencesXXX_texts
functions expect a list of sentences. When you pass a single sentence, it is treated as a list of single characters.
Right now, you get a sequence and a prediction for each letter!
Change test_converted = t.texts_to_sequences(sent)
to test_converted = t.texts_to_sequences([sent])
and you're good.
Remember, it is text*s*_to_sequence*s*
, not text_to_sequence
!
Additionally, you need to use the training tokenizer on your test data! Otherwise, the tokens are different and you will get non-sensical results.
For example, your training tokenizer might encode "movie" as token 123 but, for your test tokenizer, token 123 could be "actor". If you don't use the same word index, the test sentences become gibberish to your model.
Upvotes: 4
Reputation: 2453
This is because you are using word embedding. To get your output, use:
model.predict(padded_text)[0]
However, for classification purposes, you should go for an output shape of (2,)
and use a softmax activation so that you would output a vector of certainties about your input being positive or negative. Then, an argmax on this output vector would yield the class your network thinks the input belongs to.
Upvotes: -1