maybeyourneighour
maybeyourneighour

Reputation: 494

confusion matrix just takes class 0 and 1

I built the following LSTM network and it works fine, although it reaches just 60% accuracy. I think this is due to the problem, that it just classifies labels 0 and 1 and not 2 and 3 because the confusion matrix has zeros for class 2 und 3.

import keras 
import numpy as np
from keras.preprocessing.text import Tokenizer
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Input, Dense, Dropout, Embedding, LSTM, Flatten
from keras.models import Model
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
plt.style.use('ggplot')
%matplotlib inline
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix

data = pd.read_csv("dataset/train_set.csv", sep="\t")


data['num_words'] = data.Text.apply(lambda x : len(x.split()))


num_class = len(np.unique(data.Label.values)) # 4
y = data['Label'].values


MAX_LEN = 300
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data.Text.values)


post_seq = tokenizer.texts_to_sequences(data.Text.values)
post_seq_padded = pad_sequences(post_seq, maxlen=MAX_LEN)


X_train, X_test, y_train, y_test = train_test_split(post_seq_padded, y, test_size=0.25)


vocab_size = len(tokenizer.word_index) +1 


inputs = Input(shape=(MAX_LEN, ))
embedding_layer = Embedding(vocab_size,
                            128,
                            input_length=MAX_LEN)(inputs)

x = LSTM(64)(embedding_layer)
x = Dense(32, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['acc'])

model.summary()

filepath="weights.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.25, 
          shuffle=True, epochs=10, callbacks=[checkpointer])

df = pd.DataFrame({'epochs':history.epoch, 'accuracy': history.history['acc'], 'validation_accuracy': history.history['val_acc']})
g = sns.pointplot(x="epochs", y="accuracy", data=df, fit_reg=False)
g = sns.pointplot(x="epochs", y="validation_accuracy", data=df, fit_reg=False, color='green')

model.load_weights('weights.hdf5')
predicted = model.predict(X_test)

predicted = np.argmax(predicted, axis=1)

accuracy_score(y_test, predicted)

print(accuracy_score)

y_pred1 = model.predict(X_test, verbose=0)
yhat_classes = np.argmax(y_pred1,axis=1)
# predict probabilities for test set
yhat_probs = model.predict(X_test, verbose=0)
# reduce to 1d array
yhat_probs = yhat_probs[:, 0]
yhat_classes = yhat_classes[:, ]

# accuracy: (tp + tn) / (p + n)
accuracy = accuracy_score(y_test, yhat_classes)
print('Accuracy: %f' % accuracy)
# precision tp / (tp + fp)
precision = precision_score(y_test, yhat_classes, average='micro')
print('Precision: %f' % precision)
# recall: tp / (tp + fn)
recall = recall_score(y_test, yhat_classes, average='micro')
print('Recall: %f' % recall)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(y_test, yhat_classes, average='micro')
print('F1 score: %f' % f1)
matrix = confusion_matrix(y_test, yhat_classes) 
print(matrix)

confusion matrix:

[[324 146   0   0]
 [109 221   0   0]
 [ 55  34   0   0]
 [ 50  16   0   0]]

The average is set to 'micro' and the output layer has four nodes for the four classes. The accuracy, f1-score and recall only from the train_set is this (class 2 is sometimes predicted, but class 3 not once):

Accuracy: 0.888539
Precision: 0.888539
Recall: 0.888539

Does anyone know why this happens?

Upvotes: 0

Views: 970

Answers (1)

ivallesp
ivallesp

Reputation: 2222

It may be that the model gets stucked into a suboptimal solution. In your problem, classes 0 and 1 represent 85% of the total instances, so it is quite imbalanced. The model is predicting class 0 and 1 because it didn't fully converge and this is a classical error mode in this kind of models. In an informal way, you can think about it like the model is lazy... What I would recommend you would be:

  • Train longer
  • Try to see if your model can overfit your training data. For that, I would train longer and measure the train error. You will see that if there is not a major problem in your model or in your data, the model will end up predicting classes 2 and 3 at least in your training set. From that point you can discard you have a problem in your data/model
  • Use batch normalization, in practice I have seen it helps getting rid of this error mode
  • Use always a bit of dropout, it helps regularizing the model.

Upvotes: 1

Related Questions