Reputation: 1117
I am working on a classification problem of the semeval 2017 task 4A dataset can be found here and I am using deep LSTM network for it. In pre-processing, I have done lower casing->tokenization->lemmatization->removing stop words->removing punctuations. For word embeddings, I have used WORD2VEC model. There are 18,000 samples in my training set and 2000 samples in testing.
The code for my model is
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Bidirectional(LSTM(32, use_bias=True, return_sequences=True), input_shape=(128, 1,64)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(SeqSelfAttention(attention_activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
The value of max_words is 2000 and max_len is 300
But even after this, my testing accuracy is not crossing 50%. I can't figure out the problem. PS - I am using validation technique too. The loss function is 'Binary Crossentropy' and optimizer is 'Adam'.
Upvotes: 2
Views: 486
Reputation: 576
Training "LSTM" is very different with other common deep learning model. I recommend a higher dropout rate like 0.7,0.8. and Adam optimizer is particularly unstable in LSTM with real world data. So, i recommend SGD scheduled for a momentum of 0.9 and ReduceLROnPlateau. You have to do very long training, and if spark loss is observed, the training is going very well. (Spark Loss is a word used by NVIDIA researchers. It refers to a phenomenon in which the value of Loss that appears to converge increases significantly.)
Upvotes: 0