mickG
mickG

Reputation: 335

How to efficiently calculate log-returns

I have a multi-dimensional array double[,] results where each column represent a price time series for particular items (e.g. car, house...). I would like to calculate the log-returns for each time series as Log(price_t / price_t1) Where t>t1. So I will generate a new time series of log-returns for each column of double[,] results. How can this be done in C# in a efficient way? The amount of data is large and I was trying a solution like:

for(int col = 1; col <= C; col++)
{
     for(int row = 1; row <= R; row++)
     {
         ret = Math.Log(results[row+1;col]/ results[row;col])
     }
}

where C and R are the numbers of columns and rows in double[,] results. This solutioh runs quite slowly and seems very inefficient. Any suggestion how to perform a similar calculation faster?

I saw that in languages like MATLAB one can vectorize the code and simply divide the original matrix by another one that is just lagged by one element. Then take the log of the entire matrix that results from the division. Is it doable in C# and how?

Upvotes: 1

Views: 1076

Answers (1)

Martin Liversage
Martin Liversage

Reputation: 106836

If your computer has multiple cores you can easily improve the speed of your computation. To try it out myself I started by creating this function:

Double[,] ComputeLogReturns(Double[,] data) {
  var rows = data.GetLength(0);
  var columns = data.GetLength(1);
  var result = new Double[rows - 1, columns];
  for (var row = 0; row < rows - 1; row += 1)
    for (var column = 0; column < columns; column += 1)
      result[row, column] = Math.Log(data[row + 1, column]/data[row, column]);
  return result;
}

I benchmarked this function with an input array of 1,000 x 1,000 values. On my computer the execution time of 100 calls was around 3 seconds.

Because the body of the loop can execute in parallel I then rewrote the function to use Parallel.For:

Double[,] ComputeLogReturnsParallel(Double[,] data) {
  var rows = data.GetLength(0);
  var columns = data.GetLength(1);
  var result = new Double[rows - 1, columns];
  Parallel.For(0, rows - 1, row => {
    for (var column = 0; column < columns; column += 1)
      result[row, column] = Math.Log(data[row + 1, column]/data[row, column]);
  });
  return result;
}

On my computer with 4 cores (8 logical cores) the execution of 100 calls takes around 0.9 seconds. This is slightly more than 3 times as fast indicating that only the physical and not the logical cores are able to compute the logarithm.

Modern x86 CPU's have special instructions called SSE that allow you to vectorize certain computations. I would expect that MATLAB uses these instructions and that might explain why you experience much better performance in MATLAB compared to your own C# code.

To test SSE I tried Yeppp! that has bindings to C#. The library is available on NuGet as a prerelease and has a logarithm function. SSE instructions only works on one-dimensional arrays so I rewrote the baseline function:

Double[] ComputeLogReturns(Double[] data, Int32 rows, Int32 columns) {
  var result = new Double[(rows - 1)*columns];
  for (var row = 0; row < rows - 1; row += 1)
    for (var column = 0; column < columns; column += 1)
      result[row*columns + column] = Math.Log(data[(row + 1)*columns + column]/data[row*columns + column]);
  return result;
}

With the same input and 100 iterations, execution time now seems to be slightly less than 3 seconds indicating that a one-dimensional array might improve performance slightly (but logically it should not unless it is the extra argument checking that affects the execution time).

Using Yeppp! the function becomes:

Double[] ComputeLogReturnsSse(Double[] data, Int32 rows, Int32 columns) {
  var quotient = new Double[(rows - 1)*columns];
  for (var row = 0; row < rows - 1; row += 1)
    for (var column = 0; column < columns; column += 1)
      quotient[row*columns + column] = data[(row + 1)*columns + column]/data[row*columns + column];
  var result = new Double[(rows - 1)*columns];
  Yeppp.Math.Log_V64f_V64f(quotient, 0, result, 0, quotient.Length);
  return result;
}

I was unable to find a function to perform vectorized division using Yeppp! so the division is performed using "normal" division. However, I still expect the logarithm to be the most expensive operation. Initially performance was horrible with 100 iterations taking 17 seconds but then I noticed that there was an issue in Yeppp! about bad performance when running as a 32 bit process. Switching to 64 bit improved performance significantly resulting in around 1.3 seconds of executing time. Getting rid of the two array allocations inside the function (which was repeated 100 times) lowered execution to around 0.7 seconds which is faster than the parallel implementation. Using Parallel.For to do the multiplication lowered execution time to around 0.4 seconds. If Yeppp! had a way to perform division (which SSE has) you might get even lower execution time perhaps resulting in a ten fold speed increase.

Based on my experiments with SSE you should be able to achieve considerable performance improvements. However, you should probably pay attention to precision if that matters. An SSE log function might provide slightly different results compared to the .NET implementation.

Upvotes: 3

Related Questions