Dennix
Dennix

Reputation: 115

Generating Correlated Random Samples

I read this tutorial https://scipy-cookbook.readthedocs.io/items/CorrelatedRandomSamples.html on how to get a matrix C so that C*C^T = R, with R being a given covariance matrix. The code example implements two differents methods, Cholesky decomposition or using the eigenvalues. To my suprise printing the resulting C of the two different methods gives me two different matrices:

Eigenvalue method result:

[[ 0.11928653 -0.86036701  1.6265114 ]
 [ 0.00835653 -0.89810227 -2.16641235]
 [ 0.18832863  0.58480336 -0.93409708]]

Cholesky method result:

[[ 1.84390889  0.          0.        ]
 [-1.4913969   1.80989925  0.        ]
 [-1.08465229 -0.06500199  0.26325682]]

If someone could explain to me why the two resulting matrices are different I would be very grateful.

Code to produce the results:

"""Example of generating correlated normally distributed random samples."""

import numpy as np
from scipy.linalg import eigh, cholesky
from scipy.stats import norm

from pylab import plot, show, axis, subplot, xlabel, ylabel, grid


# Choice of cholesky or eigenvector method.
method = 'cholesky'
#method = 'eigenvectors'

num_samples = 400

# The desired covariance matrix.
r = np.array([
        [  3.40, -2.75, -2.00],
        [ -2.75,  5.50,  1.50],
        [ -2.00,  1.50,  1.25]
    ])

# Generate samples from three independent normally distributed random
# variables (with mean 0 and std. dev. 1).
x = norm.rvs(size=(3, num_samples))

# We need a matrix `c` for which `c*c^T = r`.  We can use, for example,
# the Cholesky decomposition, or the we can construct `c` from the
# eigenvectors and eigenvalues.

if method == 'cholesky':
    # Compute the Cholesky decomposition.
    c = cholesky(r, lower=True)
else:
    # Compute the eigenvalues and eigenvectors.
    evals, evecs = eigh(r)
    # Construct c, so c*c^T = r.
    c = np.dot(evecs, np.diag(np.sqrt(evals)))

# Convert the data to correlated random variables. 
y = np.dot(c, x)

#
# Plot various projections of the samples.
#
subplot(2,2,1)
plot(y[0], y[1], 'b.')
ylabel('y[1]')
axis('equal')
grid(True)

subplot(2,2,3)
plot(y[0], y[2], 'b.')
xlabel('y[0]')
ylabel('y[2]')
axis('equal')
grid(True)

subplot(2,2,4)
plot(y[1], y[2], 'b.')
xlabel('y[1]')
axis('equal')
grid(True)

show()

Upvotes: 2

Views: 1602

Answers (2)

lightalchemist
lightalchemist

Reputation: 10211

As you yourself has said, the decomposition of R into CC^T is not unique. There are more than 1 possible C that satisfy this expression.

To see this, let

U @ S @ U.T = R = L @ L.T

where '@' is the operator that performs matrix multiplication of numpy arrays, and 'U.T' transposes the matrix U.

Given any invertible matrix Q, where Q @ np.inv(Q) = I,

R = U @ Q @ np.inv(Q) @ U.T = U @ U.T

In fact, we can solve for a specific matrix Q such that

U = evecs @ np.diag(np.sqrt(evals))) 
L = cholesky(r, lower=True)
U @ Q = L

To do this,

import numpy as np

r = np.array([
        [  3.40, -2.75, -2.00],
        [ -2.75,  5.50,  1.50],
        [ -2.00,  1.50,  1.25]
    ])

evals, evecs = eigh(r)
U = evecs @ np.diag(np.sqrt(evals))
L = cholesky(r, lower=True)
Q = np.linalg.pinv(U) @ L
U_new = U @ Q
print("U_new:")
print(U_new)
print("L:")
print(L)
print("U_new == L up to numerical differences: {}".format(np.allclose(U_new, L))

And the output is

U_new:
[[ 1.84390889e+00  1.87763803e-17 -1.53304396e-17]
 [-1.49139690e+00  1.80989925e+00  4.13531074e-19]
 [-1.08465229e+00 -6.50019933e-02  2.63256819e-01]]
L:
[[ 1.84390889  0.          0.        ]
 [-1.4913969   1.80989925  0.        ]
 [-1.08465229 -0.06500199  0.26325682]]
U_new == L up to numerical differences: True

Upvotes: 2

Dennix
Dennix

Reputation: 115

I think I meanwhile found the answer. The matrix decompostion of the covariance matrix R into R = CC^T is not unambiguous. Different methods as calculating the Cholesky Decomposition or using eigenvalues yield different matrices C that fit the formula R = CC^T.

Upvotes: 0

Related Questions