Reputation: 1947
I want to define multivariate normal distribution with mean [1, 1, 1]
and variance covariance matrix with 0.3
on diagonal. After that I want to calculate log likelihood on datapoints [2, 3, 4]
By torch distributions
import torch
import torch.distributions as td
input_x = torch.tensor([2, 3, 4])
loc = torch.ones(3)
scale = torch.eye(3) * 0.3
mvn = td.MultivariateNormal(loc = loc, scale_tril=scale)
mvn.log_prob(input_x)
tensor(-76.9227)
From scratch
By using formula for log likelihood:
We obtain tensor:
first_term = (2 * np.pi* 0.3)**(3)
first_term = -np.log(np.sqrt(first_term))
x_center = input_x - loc
tmp = torch.matmul(x_center, scale.inverse())
tmp = -1/2 * torch.matmul(tmp, x_center)
first_term + tmp
tensor(-24.2842)
My question is - what's the source of this discrepancy?
Upvotes: 1
Views: 350
Reputation: 6115
You are passing the covariance matrix to the scale_tril
instead of covariance_matrix
. From the docs of PyTorch's Multivariate Normal
scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal
So, replacing scale_tril
with covariance_matrix
would yield the same results as your manual attempt.
In [1]: mvn = td.MultivariateNormal(loc = loc, covariance_matrix=scale)
In [2]: mvn.log_prob(input_x)
Out[2]: tensor(-24.2842)
However, it's more efficient to use scale_tril
according to the authors:
...Using scale_tril will be more efficient:
You can calculate the lower choelsky using torch.linalg.cholesky
In [3]: mvn = td.MultivariateNormal(loc = loc, scale_tril=torch.linalg.cholesky(scale))
In [4]: mvn.log_prob(input_x)
Out[4]: tensor(-24.2842)
Upvotes: 2