angelo curti giardina
angelo curti giardina

Reputation: 73

LSTM Text Classification Bad Accuracy Keras

I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this:

model = Sequential()

model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True, weights=[embedding_weights] ))
model.add(Dropout(0.25))
model.add(LSTM(output_dim=embeddings_dim , activation='sigmoid', inner_activation='hard_sigmoid', return_sequences=True))
model.add(Dropout(0.25))
model.add(LSTM(activation='sigmoid', units=embeddings_dim, recurrent_activation='hard_sigmoid', return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))

adam=keras.optimizers.Adam(lr=0.04)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

Only that I have too low an accuracy .. with the binary-crossentropy I get a good accuracy, but the results are wrong !!!!! changing to categorical-crossentropy, I get very low accuracy. Do you have any suggestions?

there is my code: GitHubProject - Multi-Label-Text-Classification

Upvotes: 5

Views: 2297

Answers (2)

Upasana Mittal
Upasana Mittal

Reputation: 2680

In last layer, the activation function you are using is sigmoid, so binary_crossentropy should be used. Incase you want to use categorical_crossentropy then use softmax as activation function in last layer.

Now, coming to the other part of your model, since you are working with text, i would tell you to go for tanh as activation function in LSTM layers.

And you can try using LSTM's dropouts as well like dropout and recurrent dropout

LSTM(units, dropout=0.2, recurrent_dropout=0.2,
                             activation='tanh')

You can define units as 64 or 128. Start from small number and after testing you take them till 1024.

You can try adding convolution layer as well for extracting features or use Bidirectional LSTM But models based Bidirectional takes time to train.

Moreover, since you are working on text, pre-processing of text and size of training data always play much bigger role than expected.

Edited

Add Class weights in fit parameter

class_weights = class_weight.compute_class_weight('balanced',
                                                  np.unique(labels),
                                                  labels)
class_weights_dict = dict(zip(le.transform(list(le.classes_)),
                          class_weights))


model.fit(x_train, y_train, validation_split, class_weight=class_weights_dict)

Upvotes: 8

Ioannis Nasios
Ioannis Nasios

Reputation: 8527

change:

model.add(Activation('sigmoid'))

to:

model.add(Activation('softmax'))

Upvotes: 4

Related Questions