user3179067
user3179067

Reputation: 23

How to use a Gaussian Process for Binary Classification?

I know that a Gaussian Process model is best suited for regression rather than classification. However, I would still like to apply a Gaussian Process to a classification task but I am not sure what is the best way to bin the predictions generated by the model. I have reviewed the Gaussian Process classification example that is available on the scikit-learn website at:

http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html

But I found this example confusing (I have listed the things I found confusing about this example at the end of the question). To try and get a better understanding I have created a very basic python code example using scikit-learn that generates classifications by applying a decision boundary to the predictions made by a gaussian process:

#A minimum example illustrating how to use a
#Gaussian Processes for binary classification
import numpy as np
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.gaussian_process import GaussianProcess

if __name__ == "__main__":
    #defines some basic training and test data
    #If the descriptive features have large values
    #(i.e., 8s and 9s) the target is 1
    #If the descriptive features have small values
    #(i.e., 2s and 3s) the target is 0
    TRAININPUTS = np.array([[8, 9, 9, 9, 9],
                            [9, 8, 9, 9, 9],
                            [9, 9, 8, 9, 9],
                            [9, 9, 9, 8, 9],
                            [9, 9, 9, 9, 8],
                            [2, 3, 3, 3, 3],
                            [3, 2, 3, 3, 3],
                            [3, 3, 2, 3, 3],
                            [3, 3, 3, 2, 3],
                            [3, 3, 3, 3, 2]])
    TRAINTARGETS = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
    TESTINPUTS = np.array([[8, 8, 9, 9, 9],
                           [9, 9, 8, 8, 9],
                           [3, 3, 3, 3, 3],
                           [3, 2, 3, 2, 3],
                           [3, 2, 2, 3, 2],
                           [2, 2, 2, 2, 2]])
    TESTTARGETS = np.array([1, 1, 0, 0, 0, 0])
    DECISIONBOUNDARY = 0.5

    #Fit a gaussian process model to the data
    gp = GaussianProcess(theta0=10e-1, random_start=100)
    gp.fit(TRAININPUTS, TRAINTARGETS)
    #Generate a set of predictions for the test data
    y_pred = gp.predict(TESTINPUTS)
    print "Predicted Values:"
    print y_pred
    print "----------------"
    #Convert the continuous predictions into the classes
    #by splitting on a decision boundary of 0.5
    predictions = []
    for y in y_pred:
        if y > DECISIONBOUNDARY:
            predictions.append(1)
        else:
            predictions.append(0)
    print "Binned Predictions (decision boundary = 0.5):"
    print predictions
    print "----------------"
    #print out the confusion matrix specifiy 1 as the positive class
    cm = confusion_matrix(TESTTARGETS, predictions, [1, 0])
    print "Confusion Matrix (1 as positive class):"
    print cm
    print "----------------"
    print "Classification Report:"
    print metrics.classification_report(TESTTARGETS, predictions)

When I run this code I get the following output:

Predicted Values:
[ 0.96914832  0.96914832 -0.03172673  0.03085167  0.06066993  0.11677634]
----------------
Binned Predictions (decision boundary = 0.5):
[1, 1, 0, 0, 0, 0]
----------------
Confusion Matrix (1 as positive class):
[[2 0]
 [0 4]]
----------------
Classification Report:
         precision    recall  f1-score   support

          0       1.00      1.00      1.00         4
          1       1.00      1.00      1.00         2

avg / total       1.00      1.00      1.00         6

The approach used in this basic example seems to work fine with this simple dataset. But this approach is very different from the classification example given on the scikit-lean website that I mentioned above (url repeated here):

http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gp_probabilistic_classification_after_regression.html

So I'm wondering if I am missing something here. So, I would appreciate if anyone could:

  1. With respect to the classification example given on the scikit-learn website:

    1.1 explain what the probabilities being generated in this example are probabilities of? Are they the probability of the query instance belonging to the class >0?

    1.2 why the example uses a cumulative density function instead of a probability density function?

    1.3 why the example divides the predictions made by the model by the square root of the mean square error before they are input into the cumulative density function?

  2. With respect to the basic code example I have listed here, clarify whether or not applying a simple decision boundary to the predictions generated by a gaussian process model is an appropriate way to do binary classification?

Sorry for such a long question and thanks for any help.

Upvotes: 2

Views: 4762

Answers (1)

user1149913
user1149913

Reputation: 4523

In the GP classifier, a standard GP distribution over functions is "squashed," usually using the standard normal CDF (also called the probit function), to map it to a distribution over binary categories.

Another interpretation of this process is through a hierarchical model (this paper has the derivation), with a hidden variable drawn from a Gaussian Process.

In sklearn's gp library, it looks like the output from y_pred, MSE=gp.predict(xx, eval_MSE=True) are the (approximate) posterior means (y_pred) and posterior variances (MSE) evaluated at points in xx before any squashing occurs.

To obtain the probability that a point from the test set belongs to the positive class, you can convert the normal distribution over y_pred to a binary distribution by applying the Normal CDF (see [this paper again] for details).

The hierarchical model of the probit squashing function is defined by a 0 decision boundary (the standard normal distribution is symmetric around 0, meaning PHI(0)=.5). So you should set DECISIONBOUNDARY=0.

Upvotes: 3

Related Questions