Reputation: 125
I am making a model for classification of webpages titles into one of 101 classes regarding food (most of the titles regard recipes). The medium length of my sequences is 42. I cleaned the text (bad words, changed to lowercase etc) and tokenized it using a Tokenizer. I put a LSTM layer in my model, and I get 83% accuracy on the test set. I'm pretty sure this can be improved making some changes to the network, do you have any suggestions? Thank you in advance! That's my model:
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=x_train.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(101, activation='softmax'))
opt = optimizers.Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
Upvotes: 0
Views: 275
Reputation: 14983
One thing you could try to do is to add another LSTM layer, but please pay attention to the number of units: increasing them too much can easily lead to overfitting. Otherwise, gradually reducing the learning rate when you reach a plateau could also contribute to an increase.
If you add another one, do not forget to add "return_sequences=True
" in the first LSTM layer.
You should also have a validation set reserved for metrics (apart from the test set).
Upvotes: 1