Josh
Josh

Reputation: 3311

Custom loss function not improving with epochs

I have created a custom loss function to deal with binary class imbalance, but my loss function does not improve per epoch. For metrics, I'm using precision and recall.

Is this a design issue where I'm not picking good hyper-parameters?

weights = [np.array([.10,.90]), np.array([.5,.5]), np.array([.1,.99]), np.array([.25,.75]), np.array([.35,.65])]
for weight in weights:
    print('Model with weights {a}'.format(a=weight))
    model = keras.models.Sequential([
    keras.layers.Flatten(), #input_shape=[X_train.shape[1]]
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')])
    model.compile(loss=weighted_loss(weight),metrics=[tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])     
    
    n_epochs = 10
    history = model.fit(X_train.astype('float32'), y_train.values.astype('float32'), epochs=n_epochs, validation_data=(X_test.astype('float32'), y_test.values.astype('float32')), batch_size=64)   
    model.evaluate(X_test.astype('float32'), y_test.astype('float32'))
    pd.DataFrame(history.history).plot(figsize=(8, 5))
    plt.grid(True); plt.gca().set_ylim(0, 1); plt.show()

Custom loss function to deal with class imbalance issue:

def weighted_loss(weights):
    weights = K.variable(weights)            
    def loss(y_true, y_pred):
        y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
        y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
        loss = y_true * K.log(y_pred) * weights
        loss = -K.sum(loss, -1)      
        return loss
    return loss

Output:

Model with weights [0.1 0.9]
Epoch 1/10
274/274 [==============================] - 1s 2ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 2/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 3/10
274/274 [==============================] - 0s 1ms/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
Epoch 4/10
274/274 [==============================] - 0s 969us/step - loss: 1.1921e-08 - precision_24: 0.1092 - recall_24: 0.4119 - val_loss: 1.4074e-08 - val_precision_24: 0.1247 - val_recall_24: 0.3953
[...]

Image of the input data set and the true y variable class designation: Input Dataset a (17480 X 20) matrix: enter image description here

y is the output array (2 classes) with dimensions (17480 x 1) and total number of 1's is: 1748 (the class that I want to predict)

Upvotes: 1

Views: 690

Answers (1)

Since there is no MWE present it's rather difficult to be sure. In order to be as educative as possible I'll lay out some observations and remarks.

The first observation is that your custom loss function has really small values i.e. ~10e-8 throughout training. This seems to tell your model that performance is already really good while in fact, when looking at the metrics you chose, it isn't. This indicates that the problem resides near the output or has something to do with the loss function. My recommendation here is since you have a classification problem to have a look at this post regarding weighted cross-entropy [1].

Second observation is that it seems you don't have a benchmark for performance of your model. In general, ML workflow goes from very simple to complex models. I would recommend trying a simple Logistic Regression [2] to get an idea for minimal performance. After this I would try some more complex models such as tree booster (XGBoost/LightGBM/...) or a random forest. Especially considering you are using a full-blown neural network for tabular data with only about 20 numerical features that tends to still be in the traditional machine learning territory.

Once you have obtained a baseline and perhaps improved performance using a standard machine learning technique, you can look towards a neural network again. Some other recommendations depending on the results of the traditional approaches are:

  • Try several and optimizers and cross-validate them over different learning rates.

  • Try, as mentioned by @TyQuangTu, some simpler and shallower architectures.

  • Try an activation function that does not have the "dying neuron" problems such as LeakyRelu or ELU.

Hopefully this answer can help you and if you have any more questions I am glad to help.

[1] Unbalanced data and weighted cross entropy

[2] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Upvotes: 2

Related Questions