Which loss function to use in Keras Sequential Model

Question

I am using a Keras sequential model, a prediction output is of the shape (1, 5) (5 features).

I have an accuracy metric defined as follows:

For N predictions, the accuracy of the model will be the percentage of predicted samples such that: for each prediction and its respective true labels, all of the features are with no more than 10 difference.

For example, if y_i = [1, 2, 3, 4, 5] and ypred_i = [1, 2, 3, 4, 16] is not a match since the last feature has difference 11. If y_i = [1, 2, 3, 4, 5] and ypred_i = [10, 8, 0, 5, 7] is a match, because all features have no more than 10 difference to its respective real features.

I am wondering which loss function to use in my Keras sequential model as to increase the explained accuracy the most. Should I define a custom loss function, how should it look like, or how should I proceed?

My code is:

class NeuralNetMulti(Regressor):
    def __init__(self):
        self.name = 'keras-sequential'
        self.model = Sequential()
        # self.earlystopping = callbacks.EarlyStopping(monitor="mae",
        #                                              mode="min", patience=5,
        #                                              restore_best_weights=True)

    def fit(self, X, y):
        print('Fitting into the neural net...')
        n_inputs = X.shape[1]
        n_outputs = y.shape[1]
        self.model.add(Dense(400, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
        # self.model.add(Dense(20, activation='relu'))
        self.model.add(Dense(200, activation='relu'))
        # self.model.add(Dense(10, activation='relu'))
        self.model.add(Dense(n_outputs))
        self.model.summary()
        self.model.compile(loss='mae', optimizer='adam', metrics=['mse', 'mae', 'accuracy'])
        history = self.model.fit(X, y, verbose=1, epochs=200, validation_split=0.1)
        # self.model.fit(X, y, verbose=1, epochs=1000, callbacks=[self.earlystopping])
        print('Fitting completed!')

    def predict(self, X):
        print('Predicting...')
        predictions = self.model.predict(X, verbose=1)
        print('Predicted!')
        return predictions

My suggestion for a loss function:

def N_distance(y_true, y_pred):
    score = 0
    vals = abs(y_true - y_pred)
    if all(a <= 10 for a in vals):
            return 0
return 1

It returns:

0 if the condition holds
1 otherwise.

Paloha · Accepted Answer

First of all, your loss needs to be differentiable so that it is possible to compute the gradient with respect to the weights. Then it is possible to use the gradient to optimize the weights which is the whole point of gradient based optimization algorithms like Gradient Descent. If you write your own loss, this is the first thing you need to keep in mind. This is why your loss does not work. You need to rethink your loss or the whole problem.

Next, do not forget, you need to use keras or tensorflow functions in your loss, so the used functions have the gradient defined and the chain rule can be applied. Using just abs() is not a good idea. This question might point you to the right direction https://ai.stackexchange.com/questions/26426/why-is-tf-abs-non-differentiable-in-tensorflow.

Furthermore, from your question and comments I see the expected output should be between 0 and 100. In that case, I would try to scale the output and the labels of the network such that they always lie in that range. There are multiple ways you can go about it. Divide your labels by 100 and either use sigmoid activation on the outputs and or check e.g. this answer How to restrict output of a neural net to a specific range?.

Then you can start thinking how to write your loss. From your description it is not clear what would happen in this case: y_i = [1, 2, 3, 4, 100] and pred = [1, 2, 3, 4, 110]. Is the value 110 still acceptable even though it should not be possible in theory?

Anyways, you can just use mae or mse as a loss. Your network would try to fit perfectly and then you can use your special non-differentiable function just as a metric to measure how well is your network trained according to your rules.

An explicit example:

The last layer of your network needs to have an activation specified like so self.model.add(Dense(n_outputs, activation='sigmoid')) which will scale all the network output to the interval from 0 to 1.
Since your labels are defined on an interval from 0 - 100, you just need to divide your labels to also be in the interval from 0 to 1 before using them in the network by y \= 100.
Then you can use mae or mse as a loss and your special function just as a metric. self.model.compile(loss='mae', optimizer='adam', metrics=[custom_metric])

The custom_metric function can look like this:

def custom_metric(y_true, y_pred):
    valid_distance = 0.1
    valid = tf.abs(y_true - y_pred) <= valid_distance
    return tf.reduce_mean(tf.cast(tf.reduce_all(valid, axis=1), tf.float32))

Which loss function to use in Keras Sequential Model

Answers (1)

Related Questions