Baron Yugovich
Baron Yugovich

Reputation: 4307

Python - generate array of specific autocorrelation

I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn?

Just to be explicit and precise, this is the autocorrelation I want to control:

numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1]

Upvotes: 11

Views: 6136

Answers (1)

Dan Oneață
Dan Oneață

Reputation: 978

If you are interested only in the auto-correlation at lag one, you can generate an auto-regressive process of order one with the parameter equal to the desired auto-correlation; this property is mentioned on the Wikipedia page, but it's not hard to prove it.

Here is some sample code:

import numpy as np

def sample_signal(n_samples, corr, mu=0, sigma=1):
    assert -1 < corr < 1, "Auto-correlation coefficient must be between -1 and 1"
    
    # Find out the offset `c` and the std of the white noise `sigma_e`
    # that produce a signal with the desired mean and variance.
    # See https://en.wikipedia.org/wiki/Autoregressive_model
    # under section "Example: An AR(1) process".
    c = mu * (1 - corr)
    sigma_e = np.sqrt((sigma ** 2) * (1 - corr ** 2))

    # Sample the auto-regressive process.
    signal = [np.random.normal(mu, sigma)]
    for _ in range(1, n_samples):
        signal.append(c + corr * signal[-1] + np.random.normal(0, sigma_e))

    return np.array(signal)

def compute_corr_lag_1(signal):
    return np.corrcoef(signal[:-1], signal[1:])[0][1]

# Examples.
print(compute_corr_lag_1(sample_signal(5000, 0.5)))
print(np.mean(sample_signal(5000, 0.5, mu=2)))
print(np.std(sample_signal(5000, 0.5, sigma=3)))

The parameter corr lets you set the desired auto-correlation at lag one and the optional parameters, mu and sigma, let you control the mean and standard deviation of the generated signal.

Upvotes: 9

Related Questions