Reputation: 494
I built the following LSTM network and it works fine, although it reaches just 60% accuracy. I think this is due to the problem, that it just classifies labels 0 and 1 and not 2 and 3 because the confusion matrix has zeros for class 2 und 3.
import keras
import numpy as np
from keras.preprocessing.text import Tokenizer
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Input, Dense, Dropout, Embedding, LSTM, Flatten
from keras.models import Model
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
plt.style.use('ggplot')
%matplotlib inline
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
data = pd.read_csv("dataset/train_set.csv", sep="\t")
data['num_words'] = data.Text.apply(lambda x : len(x.split()))
num_class = len(np.unique(data.Label.values)) # 4
y = data['Label'].values
MAX_LEN = 300
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data.Text.values)
post_seq = tokenizer.texts_to_sequences(data.Text.values)
post_seq_padded = pad_sequences(post_seq, maxlen=MAX_LEN)
X_train, X_test, y_train, y_test = train_test_split(post_seq_padded, y, test_size=0.25)
vocab_size = len(tokenizer.word_index) +1
inputs = Input(shape=(MAX_LEN, ))
embedding_layer = Embedding(vocab_size,
128,
input_length=MAX_LEN)(inputs)
x = LSTM(64)(embedding_layer)
x = Dense(32, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.25,
shuffle=True, epochs=10, callbacks=[checkpointer])
df = pd.DataFrame({'epochs':history.epoch, 'accuracy': history.history['acc'], 'validation_accuracy': history.history['val_acc']})
g = sns.pointplot(x="epochs", y="accuracy", data=df, fit_reg=False)
g = sns.pointplot(x="epochs", y="validation_accuracy", data=df, fit_reg=False, color='green')
model.load_weights('weights.hdf5')
predicted = model.predict(X_test)
predicted = np.argmax(predicted, axis=1)
accuracy_score(y_test, predicted)
print(accuracy_score)
y_pred1 = model.predict(X_test, verbose=0)
yhat_classes = np.argmax(y_pred1,axis=1)
# predict probabilities for test set
yhat_probs = model.predict(X_test, verbose=0)
# reduce to 1d array
yhat_probs = yhat_probs[:, 0]
yhat_classes = yhat_classes[:, ]
# accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(y_test, yhat_classes)
print('Accuracy: %f' % accuracy)
# precision tp / (tp + fp)
precision = precision_score(y_test, yhat_classes, average='micro')
print('Precision: %f' % precision)
# recall: tp / (tp + fn)
recall = recall_score(y_test, yhat_classes, average='micro')
print('Recall: %f' % recall)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(y_test, yhat_classes, average='micro')
print('F1 score: %f' % f1)
matrix = confusion_matrix(y_test, yhat_classes)
print(matrix)
confusion matrix:
[[324 146 0 0]
[109 221 0 0]
[ 55 34 0 0]
[ 50 16 0 0]]
The average is set to 'micro' and the output layer has four nodes for the four classes. The accuracy, f1-score and recall only from the train_set is this (class 2 is sometimes predicted, but class 3 not once):
Accuracy: 0.888539
Precision: 0.888539
Recall: 0.888539
Does anyone know why this happens?
Upvotes: 0
Views: 970
Reputation: 2222
It may be that the model gets stucked into a suboptimal solution. In your problem, classes 0 and 1 represent 85% of the total instances, so it is quite imbalanced. The model is predicting class 0 and 1 because it didn't fully converge and this is a classical error mode in this kind of models. In an informal way, you can think about it like the model is lazy... What I would recommend you would be:
Upvotes: 1