How to employ the scikit-learn evaluation metrics functions with Keras in Python?

Question

Keras offers the possibility to define custom evaluation metrics --I am interested in variations of the F metric, e.g. F1, F2 etc which are provided by scikit learn-- but instructs us to do so by invoking Keras backend functions which are limited in that respect.

My aim is to use these metrics in conjunction with the Early-Stopping method of Keras. So I should find a method to integrate the metric with the learning process of the Keras Model. (Of course outside the learning/fitting process I can simply invoke Scikit-Learn with the results).

What are my options here?

Update

Having implemented Aaron's solution with titanic_all_numeric dataset from Kaggle, I get the following:

# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy', f1])

# Fit the model
hist = model.fit(predictors, target, validation_split = 0.3)

Train on 623 samples, validate on 268 samples
Epoch 1/1
623/623 [==============================] - 0s 642us/step - loss: 0.8037 - acc: 0.6132 - f1: 0.6132 - val_loss: 0.5815 - val_acc: 0.7537 - val_f1: 0.7537

# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fit the model
hist = model.fit(predictors, target, validation_split = 0.3)

Train on 623 samples, validate on 268 samples
Epoch 1/1
623/623 [==============================] - 0s 658us/step - loss: 0.8148 - acc: 0.6404 - val_loss: 0.7056 - val_acc: 0.7313

# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = [f1])

# Fit the model
hist = model.fit(predictors, target, validation_split = 0.3)

Train on 623 samples, validate on 268 samples
Epoch 1/1
623/623 [==============================] - 0s 690us/step - loss: 0.6554 - f1: 0.6709 - val_loss: 0.5107 - val_f1: 0.7612

I am wondering if these results are fine. For once, the accuracy and f1 score are the same.

AaronDT · Accepted Answer

You can just pass your predictions and labels from your keras model to any scikit-learn function for evaluation purpose. For example if you are tackling a classification problem you could utilize the classification_report from scikit-learn which provides metrics such as precision, recall, f1-score e.g. (sample code taken straight from their docs):

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

          precision    recall  f1-score   support

 class 0       0.50      1.00      0.67         1
 class 1       0.00      0.00      0.00         1
 class 2       1.00      0.67      0.80         3

 micro avg     0.60      0.60      0.60         5
 macro avg     0.50      0.56      0.49         5
 weighted avg  0.70      0.60      0.61         5

Update: In case you want to incorporate the metrics inside keras training use:

from keras import backend as K

def f1(y_true, y_pred):
    def recall(y_true, y_pred):
        """Recall metric.

        Only computes a batch-wise average of recall.

        Computes the recall, a metric for multi-label classification of
        how many relevant items are selected.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

    def precision(y_true, y_pred):
        """Precision metric.

        Only computes a batch-wise average of precision.

        Computes the precision, a metric for multi-label classification of
        how many selected items are relevant.
        """
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))


model.compile(loss='binary_crossentropy',
          optimizer= "adam",
          metrics=[f1])

How to employ the scikit-learn evaluation metrics functions with Keras in Python?

Update

Answers (1)

Related Questions