user442920
user442920

Reputation: 847

How to get the full covariance matrix and find its entropy in GPflow

I would like to compute the determinant of the covariance matrix of a GP regression in GPFlow. I am guessing I can get the covariance matrix with this function:

GPModel.predict_f_full_cov

This function was suggested here:

https://gpflow.readthedocs.io/en/develop/notebooks/regression.html

However, I have no idea how to use this function or what it returns. I need to know a function that returns the covariance matrix for my entire model and then I need to know how to compute the determinant of it.

After some effort, I figured out how to give predict_f_full_cov some points I am interested in, as we see here:

c = m.predict_f_full_cov(np.array([[.2],[.4],[.6],[.8]])))

This returned two arrays, the first of which is the mean of the predicted function for points I asked for along the x-axis. The second array is a bit of a mystery. I am guessing this is the covariance matrix. I pulled it out using this:

covMatrix = m.predict_f_full_cov(np.array([[.2],[.4],[.6],[.8]]))[1][0]

Then I looked up how to compute the determinant, like so:

x = np.linalg.det(covMatrix)

Then I computed the log of this to get an entropy for the covariance matrix:

print(-10*math.log(np.linalg.det(covMatrix)))

I ran this twice using two different sets of data. The first had high noise, the second had low noise. Strangely, the entropy went up for the lower noise data set. I am at a loss.

I found that if I just compute the covariance matrix on a small region, which should be linear, turning the noise up and down does not do what I expect. Also, if I regress the GP to a large number of points, the determinant goes to 0.0.

Here is the code I am using:

import gpflow
import numpy as np
N = 300
noiseSize = 0.01
X = np.random.rand(N,1)
Y = np.sin(12*X) + 0.66*np.cos(25*X)  + np.random.randn(N,1)*noiseSize + 3
k = gpflow.kernels.Matern52(1, lengthscales=0.3)
m = gpflow.models.GPR(X, Y, kern=k)
m.likelihood.variance = 0.01
aRange = np.linspace(0.1,0.9,200)
newRange = []
for point in aRange:
    newRange.append([point])
covMatrix = m.predict_f_full_cov(newRange)[1][0]
import math
print("Determinant: " + str(np.linalg.det(covMatrix)))
print(-10*math.log(np.linalg.det(covMatrix)))

Upvotes: 3

Views: 905

Answers (1)

user442920
user442920

Reputation: 847

So, first things first, the entropy of a multivariate normal (and a GP, given a fixed set of points on which it's evaluated) only depends on its covariance matrix.

Answers to your questions:

  1. Yes - when you make the set $X$ more and more dense, you're making the covariance matrix larger and larger, and for many simple covariance kernels, this makes the determinant smaller and smaller. My guess is that this is due to the fact that determinants of large matrices have a lot of product terms (see the Leibniz formula) and products of terms less than one tend to zero faster than their sums. You can verify this easily:

Dimension and Covar

Code for this:

import numpy as np
import matplotlib.pyplot as plt
import sklearn.gaussian_process.kernels as k

plt.style.use("ggplot"); plt.ion()

n = np.linspace(2, 25, 23, dtype = int)
d = np.zeros(len(n))

for i in range(len(n)):
    X = np.linspace(-1, 1, n[i]).reshape(-1, 1)
    S = k.RBF()(X)
    d[i] = np.log(np.linalg.det(S))

plt.scatter(n, d)
plt.ylabel("Log Determinant of Covariance Matrix")
plt.xlabel("Dimension of Covariance Matrix")

Before moving onto the next point, do note that the entropy of a multivariate normal also has a contribution from size of the matrix, so even though the determinant shoots off to zero, there's a small contribution from the dimension.

  1. With decreasing noise, as one would expect, the entropy & determinant do decrease but not tend to zero exactly; they'll decrease to the determinant due to the other kernels present in the covariance. For the demonstration below, the dimension of the covariance is kept constant ($10*10$) and the noise level is increased from 0:

Error and Determinant

Code:

e = np.logspace(1, -10, 30)
d = np.zeros(len(e))
X = np.linspace(-1, 1, 10).reshape(-1, 1)

for i in range(len(e)):
    S = (k.RBF() + k.WhiteKernel(e[i])) (X)
    d[i] = np.log(np.linalg.det(S))

e = np.log(e)

plt.scatter(e, d)
plt.ylabel("Log Determinant")
plt.xlabel("Log Error")

Upvotes: 1

Related Questions