Reputation: 43
I am trying to implement a non-stationary Gaussian covariance function in Python based on equations (5,6) of Paciorek and Schervish, 2005. See attached image:
I have produced some code which I believe is doing the right thing, although it is very inefficient as it populates the matrix C
element-wise. See the synthetic example below:
import numpy as np
from scipy.spatial.distance import cdist
np.random.seed(1)
n = 100
x = np.random.rand(n,2)
length_scales = np.random.rand(n,2)
sigma2 = 1
C = np.zeros((n,n))
for i in range(n):
Sigmai = np.diag(length_scales[i,:])
xi = np.atleast_2d(x[i,:]).T
for j in range(n):
Sigmaj = np.diag(length_scales[j,:])
xj = np.atleast_2d(x[j,:]).T
Qij = cdist(np.dot(np.diag(1/(((Sigmai+Sigmaj)/2).diagonal())),xi).T,\
np.dot(np.diag(1/(((Sigmai+Sigmaj)/2).diagonal())),xj).T,'sqeuclidean')
C[i,j] = sigma2 * np.prod(Sigmai.diagonal())**.25 * np.prod(Sigmaj.diagonal())**.25 *\
np.prod(((Sigmai+Sigmaj)/2).diagonal())**-.5 * np.exp(-Qij)
I realise I can make this slightly more efficient by just populating the lower triangle of C
, however with large n
this is still very slow...
My question is, is it possible to re-write the above code such that I don't have to compute C
iteratively?
Upvotes: 1
Views: 273
Reputation: 43
See an example using np.meshgrid
as a way to avoid using for loops, and a comparison of run times for each case:
for loops
import numpy as np
from scipy.spatial.distance import cdist
import time
np.random.seed(1)
n = 100
x = np.random.rand(n,2)
length_scales = np.random.rand(n,2)
sigma2 = 1
t = time.time()
C1 = np.zeros((n,n))
for i in range(n):
Sigmai = np.diag(length_scales[i,:])
xi = np.atleast_2d(x[i,:]).T
for j in range(n):
Sigmaj = np.diag(length_scales[j,:])
xj = np.atleast_2d(x[j,:]).T
Qij = cdist(np.dot(np.diag(1/(((Sigmai+Sigmaj)/2).diagonal())),xi).T,\
np.dot(np.diag(1/(((Sigmai+Sigmaj)/2).diagonal())),xj).T,'sqeuclidean')
C1[i,j] = sigma2 * np.prod(Sigmai.diagonal())**.25 * np.prod(Sigmaj.diagonal())**.25 *\
np.prod(((Sigmai+Sigmaj)/2).diagonal())**-.5 * np.exp(-Qij)
print('for loops:',time.time()-t,'seconds')
np.meshgrid
t = time.time()
Sigma = np.prod(length_scales,axis=1)**.25
length_scales_x1,length_scales_x2 = np.meshgrid(length_scales[:,0],length_scales[:,0])
length_scales_y1,length_scales_y2 = np.meshgrid(length_scales[:,1],length_scales[:,1])
length_mean = np.array([(length_scales_x1+length_scales_x2)/2,(length_scales_y1+length_scales_y2)/2]).transpose(1,2,0)
Sigma_i,Sigma_j = np.meshgrid(Sigma,Sigma)
Sigma_ij = np.prod(length_mean,2)**-.5
x1,x2 = np.meshgrid(x[:,0],x[:,0])
y1,y2 = np.meshgrid(x[:,1],x[:,1])
xi = np.reshape(np.array([x1,y1]).transpose(1,2,0)/length_mean,(x.shape[0]*x.shape[0],2))
xj = np.reshape(np.array([x2,y2]).transpose(1,2,0)/length_mean,(x.shape[0]*x.shape[0],2))
Qij = np.reshape((xi[:,0]-xj[:,0])**2 + (xi[:,1]-xj[:,1])**2,(x.shape[0],x.shape[0]))
C2 = sigma2 * Sigma_i * Sigma_j * Sigma_ij * np.exp(-Qij)
print('meshgrids:',time.time()-t,'seconds')
print(np.isclose(C1,C2,atol=1e-12))
print:
for loops: 0.6633138656616211 seconds
meshgrids: 0.0023801326751708984 seconds
[[ True True True ... True True True]
[ True True True ... True True True]
[ True True True ... True True True]
...
[ True True True ... True True True]
[ True True True ... True True True]
[ True True True ... True True True]]
Upvotes: 1