Reputation: 33
I am working on a document classification problem. Multi-label classification 20 different labels, 1920 documents in training, and 480 in validation. The model is a CNN with FastText embeddings and I use a logistic regression model with Ngram as baseline. The problem is that the baseline model gives a f1-score of 0.36 while the cnn only gives 0.3.
The architecture I use is from here:
https://www.kaggle.com/vsmolyakov/keras-cnn-with-fasttext-embeddings
I have been doing some parameter tuning, and the current best parameters are: dropout. 0.25, learning rate 0.001, trainable embeddings false, 128 filters, prediction threshold 0.15 and kernel size 9.
Do you guys have ideas to parameters to be special aware of, ideas to change the architecture, anything that might improve the f1-score?
# Parameters
BATCH_SIZE = 16
DROP_OUT = 0.25
N_EPOCHS = 20
N_FILTERS = 128
TRAINABLE = False
LEARNING_RATE = 0.001
N_DIM = 32
KERNEL_SIZE = 9
# Create model
model = Sequential()
model.add(Embedding(NB_WORDS, EMBED_DIM, weights=[embedding_matrix],
input_length=MAX_SEQ_LEN, trainable=TRAINABLE))
model.add(Conv1D(N_FILTERS, KERNEL_SIZE, activation='relu', padding='same'))
model.add(MaxPooling1D(2))
model.add(Conv1D(N_FILTERS, KERNEL_SIZE, activation='relu', padding='same'))
model.add(GlobalMaxPooling1D())
model.add(Dropout(DROP_OUT))
model.add(Dense(N_DIM, activation='relu', kernel_regularizer=regularizers.l2(1e-4)))
model.add(Dense(N_LABELS, activation='sigmoid')) #multi-label (k-hot encoding)
adam = optimizers.Adam(lr=LEARNING_RATE, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.summary()
Edit
I think I got some wrong hyperparameters by fixing epochs to 20 during tuning. I am now trying with a stopping criteria, the model usually converges around 30-35 epochs. It seems dropout of 0.5 works better, and I am currently tuning batch size. If somebody has some experience/knowledge about the relationship between epochs and other hyperparameters feel free to share.
Upvotes: 0
Views: 1776
Reputation: 2134
A thing you should consider in general is whether the data is imbalanced and how your model performs for each class (using for example sklearn.metrics.confusion_matrix
)
I think the dataset (2000 over 20 classes) might be not big enough for deep learning to work from scratch. You can consider augmenting your dataset or you could start by trying to fine-tune a pretrained language model for your task. See https://github.com/huggingface/pytorch-openai-transformer-lm .That could help you overcome the issue with the dataset size in general.
Upvotes: 2