Legend
Legend

Reputation: 116940

Result not matching. Floating point error?

I am trying to rewrite the R function acf that computes Auto-Correlation into C#:

class AC
{
    static void Main(string[] args)
    {
        double[] y = new double[] { 772.9, 909.4, 1080.3, 1276.2, 1380.6, 1354.8, 1096.9, 1066.7, 1108.7, 1109, 1203.7, 1328.2, 1380, 1435.3, 1416.2, 1494.9, 1525.6, 1551.1, 1539.2, 1629.1, 1665.3, 1708.7, 1799.4, 1873.3, 1973.3, 2087.6, 2208.3, 2271.4, 2365.6, 2423.3, 2416.2, 2484.8, 2608.5, 2744.1, 2729.3, 2695, 2826.7, 2958.6, 3115.2, 3192.4, 3187.1, 3248.8, 3166, 3279.1, 3489.9, 3585.2, 3676.5 };
        Console.WriteLine(String.Join("\n", acf(y, 17)));
        Console.Read();
    }

    public static double[] acf(double[] series, int maxlag)
    {
        List<double> acf_values = new List<double>();
        float flen = (float)series.Length;
        float xbar = ((float)series.Sum()) / flen;
        int N = series.Length;

        double variance = 0.0;
        for (int j = 0; j < N; j++)
        {
            variance += (series[j] - xbar)*(series[j] - xbar);
        }
        variance = variance / N;

        for (int lag = 0; lag < maxlag + 1; lag++)
        {
            if (lag == 0)
            {
                acf_values.Add(1.0);
                continue;
            }

            double autocv = 0.0;
            for (int k = 0; k < N - lag; k++)
            {
                autocv += (series[k] - xbar) * (series[lag + k] - xbar);
            }
            autocv = autocv / (N - lag);

            acf_values.Add(autocv / variance);
        }
        return acf_values.ToArray();
    }
}

I have two problems with this code:

  1. For large arrays (length = 25000), this code takes about 1-2 seconds whereas R's acf function returns in less than 200 ms.
  2. The output does not match R's output exactly.

Any suggestions on where I messed up or any optimizations to the code?

        C#          R
    1   1           1
    2   0.945805846 0.925682317
    3   0.89060465  0.85270658
    4   0.840762283 0.787096604
    5   0.806487301 0.737850083
    6   0.780259665 0.697253317
    7   0.7433111   0.648420319
    8   0.690344341 0.587527097
    9   0.625632533 0.519141887
    10  0.556860982 0.450228026
    11  0.488922355 0.38489632
    12  0.425406196 0.325843042
    13  0.367735169 0.273845337
    14  0.299647764 0.216766466
    15  0.22344712  0.156888402
    16  0.14575994  0.099240809
    17  0.072389526 0.047746281
    18  -0.003238526    -0.002067146

Upvotes: 0

Views: 168

Answers (1)

Matthew Lundberg
Matthew Lundberg

Reputation: 42669

You might try changing this line:

autocv = autocv / (N - lag);

to this:

autocv = autocv / N;

Either of these is an acceptable divisor for the expected value, and R is clearly using the second one.

To see this without having access to a C# compiler, we can read in the table that you have, and adjust the values by dividing each value in the C# column by N/(N - lag), and see that they agree with the values from R.

N is 47 here, and lag ranges from 0 to 17, so N - lag is 47:30.

After copying the table above into my local clipboard:

cr <- read.table(file='clipboard', comment='', check.names=FALSE)
cr$adj <- cr[[1]]/47*(47:30)
max(abs(cr$R - cr$adj))
## [1] 2.2766e-09

A much closer approximation.

You might do better if you define flen and xbar as type double as floats do not have 9 decimal digits of precision.

The reason that R is so much faster is that acf is implemented as native and non-managed code (either C or FORTRAN).

Upvotes: 2

Related Questions