Reputation: 116940
I am trying to rewrite the R function acf
that computes Auto-Correlation into C#:
class AC
{
static void Main(string[] args)
{
double[] y = new double[] { 772.9, 909.4, 1080.3, 1276.2, 1380.6, 1354.8, 1096.9, 1066.7, 1108.7, 1109, 1203.7, 1328.2, 1380, 1435.3, 1416.2, 1494.9, 1525.6, 1551.1, 1539.2, 1629.1, 1665.3, 1708.7, 1799.4, 1873.3, 1973.3, 2087.6, 2208.3, 2271.4, 2365.6, 2423.3, 2416.2, 2484.8, 2608.5, 2744.1, 2729.3, 2695, 2826.7, 2958.6, 3115.2, 3192.4, 3187.1, 3248.8, 3166, 3279.1, 3489.9, 3585.2, 3676.5 };
Console.WriteLine(String.Join("\n", acf(y, 17)));
Console.Read();
}
public static double[] acf(double[] series, int maxlag)
{
List<double> acf_values = new List<double>();
float flen = (float)series.Length;
float xbar = ((float)series.Sum()) / flen;
int N = series.Length;
double variance = 0.0;
for (int j = 0; j < N; j++)
{
variance += (series[j] - xbar)*(series[j] - xbar);
}
variance = variance / N;
for (int lag = 0; lag < maxlag + 1; lag++)
{
if (lag == 0)
{
acf_values.Add(1.0);
continue;
}
double autocv = 0.0;
for (int k = 0; k < N - lag; k++)
{
autocv += (series[k] - xbar) * (series[lag + k] - xbar);
}
autocv = autocv / (N - lag);
acf_values.Add(autocv / variance);
}
return acf_values.ToArray();
}
}
I have two problems with this code:
acf
function returns in less than 200 ms.Any suggestions on where I messed up or any optimizations to the code?
C# R
1 1 1
2 0.945805846 0.925682317
3 0.89060465 0.85270658
4 0.840762283 0.787096604
5 0.806487301 0.737850083
6 0.780259665 0.697253317
7 0.7433111 0.648420319
8 0.690344341 0.587527097
9 0.625632533 0.519141887
10 0.556860982 0.450228026
11 0.488922355 0.38489632
12 0.425406196 0.325843042
13 0.367735169 0.273845337
14 0.299647764 0.216766466
15 0.22344712 0.156888402
16 0.14575994 0.099240809
17 0.072389526 0.047746281
18 -0.003238526 -0.002067146
Upvotes: 0
Views: 168
Reputation: 42669
You might try changing this line:
autocv = autocv / (N - lag);
to this:
autocv = autocv / N;
Either of these is an acceptable divisor for the expected value, and R is clearly using the second one.
To see this without having access to a C# compiler, we can read in the table that you have, and adjust the values by dividing each value in the C#
column by N/(N - lag)
, and see that they agree with the values from R.
N
is 47
here, and lag ranges from 0 to 17, so N - lag
is 47:30
.
After copying the table above into my local clipboard:
cr <- read.table(file='clipboard', comment='', check.names=FALSE)
cr$adj <- cr[[1]]/47*(47:30)
max(abs(cr$R - cr$adj))
## [1] 2.2766e-09
A much closer approximation.
You might do better if you define flen
and xbar
as type double
as floats do not have 9 decimal digits of precision.
The reason that R is so much faster is that acf
is implemented as native and non-managed code (either C or FORTRAN).
Upvotes: 2