ScalaBoy
ScalaBoy

Reputation: 3392

Loss function for class imbalanced multi-class classifier in Keras

I am trying to apply deep learning to a multi-class classification problem with high class imbalance between target classes (10K, 500K, 90K, 30K). I want to write a custom loss function. This is my current model:

model = Sequential()

model.add(LSTM(
                units=10, # number of units returned by LSTM
                return_sequences=True, 
                input_shape=(timestamps,nb_features),
                dropout=0.2, 
                recurrent_dropout=0.2
              )
         )

model.add(TimeDistributed(Dense(1)))

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(units=nb_classes,
               activation='softmax'))

model.compile(loss="categorical_crossentropy",
              metrics = ['accuracy'],
              optimizer='adadelta')

Unfortunately, all predictions belong to class 1!!! The model always predicts 1 for any input...

Appreciate any pointers on how I can solve this task.

Update:

Dimensions of input data:

94981 train sequences
29494 test sequences
X_train shape: (94981, 20, 18)
X_test shape: (29494, 20, 18)
y_train shape: (94981, 4)
y_test shape: (29494, 4)

Basically in the train data I have 94981 samples. Each sample contains a sequence of 20 timestamps. There are 18 features.

The imbalance between target classes (10K, 500K, 90K, 30K) is just an example. I have similar proportions in my real dataset.

enter image description here

enter image description here

Upvotes: 0

Views: 1959

Answers (1)

Szymon Maszke
Szymon Maszke

Reputation: 24874

First of all, you have ~100k samples. Start with something smaller, like 100 samples and multiple epochs and see whether your model overfits to this smaller training dataset (if it can't, you either have an error in your code or the model is not capable to model the dependencies [I would go with the second case]). Seriously, start with this one. And remember about representing all of your classes in this small dataset.

Secondly, hidden size of LSTM may be too small, you have 18 features for each sequence and sequences have length of 20, while your hidden is only 10. And you apply dropout to top it off and regularize the network even further.

Furthermore, you may want to add some dense outputs units instead of merely returning a linear layer of size 10 x 1 for each timestamp.

Last but not least, you may want to upsample the underrepresented data. 0 class would have to be repeated say 50 times (or maybe 25), class 2 something around 4 times and your one around 10-15 times, so the network is trained on them.

Oh, and use cross-validation for your hyperparameters like the hidden size, number of dense units etc.

Plus I don't know for how many epochs you've been training this network, what is your test dataset (it is entirely possible it only constitutes of the first class if you haven't done stratification).

I think this will get you started, hit me up with any doubts in the comments.

EDIT: When it comes to metrics, you may want to check something different than mere accuracy; maybe F1 score and your loss monitoring + accuracy to see how it performs. There are other available choices, for inspiration you can check sklearn's documentation as they provide quite a few options.

Upvotes: 4

Related Questions