Mike B
Mike B

Reputation: 478

How can I optimise array multiplication by constant?

Given the following code, where a, b, c, d etc are constants:

Data[] dataArray;
Intermediate[] interArray;
Output[] outputArray;

for (int i = 0; i < length; i++)
{
  interArray[i] = (c * dataArray[i]) + (a * dataArray[i+1]);
  interArray[i] -= (b * interArray[i - 1]) + (d * interArray[i - 2]);
  outputArray[i] = interArray[i];
}

for (int i = ln-1; i > 0; i--)
{
  interArray[i - 1] = (e * dataArray[i]) + (f * dataArray[i-1]);
  interArray[i - 1] -= (g * interArray[i]) + (h * interArray[i+1]);
  outputArray[i] += interArray[i]; 
} 

How can I optimise this?

I want to walk the arrays only once. Unfortunately, I am dependant on the fact that the second loop requires the interArray to be filled by the first loop.

The reason I want to do this is because this process takes up 20% of my total running time, and I'm trying to optimise it. The arrays can be very large, and the types are usually large PODs. I'm assuming I'm stepping into cache trashing territory, which is why I'm trying to reduce the number of times I walk over the array. There is no * operator, it's just standard multiplication.

Notes : I am aware that the upper and lower boundaries of the arrays crash and burn here, due to going out of bounds. I would manually handle these.

Any advice would be appreciated! It's possible that I can't do it any faster, but I'd like to at least try!

Upvotes: 1

Views: 145

Answers (1)

user3590169
user3590169

Reputation: 396

I'm not sure this will give you a huge time saving, but I think you can do the calcs in a single pass, by expanding out the terms. This could take it down to 6 multiplications and accumulations, as compared to 8 multiplications. Additionally you wouldn't need the intermediate array. It would look something like this (please double check the expansion of this)

Data[] dataArray;
Output[] outputArray;
auto DMinus2 = -c * d;
auto DMinus1 = -a - b*c;
auto D = c - a * b + f;
...

for (int i = 0; i < length; i++)
{
    outputArray[i] = DMinus2 * dataArray[i-2] + 
        DMinus1[i-1] * dataArray[i-1] +
        D * dataArray[i] + ....
        DPlus3 + dataArray[i+3];
}

EDIT:

Firstly, apologies, my first answer is not totally correct. However, I am fairly confident that it is possible to simplify the loops.

For example, in the first loop

interArray[i] = (c * dataArray[i]) + (a * dataArray[i+1]);
interArray[i] -= (b * interArray[i - 1]) + (d * interArray[i - 2]);
outputArray[i] = interArray[i];

Can be simplified to

interArray[i] = (c * dataArray[i]) + (a * dataArray[i+1]) -
    (b * interArray[i - 1]) + (d * interArray[i - 2]);
outputArray[i] = interArray[i];

I am assuming that values outside the range are 0 Consider i = 0 then we have that

outputArray[0] = (c * dataArray[0]) + (a * dataArray[1]);

i = 1 gives

outputArray[1] = (c * dataArray[1]) + (a * dataArray[2]) - 
    b * outputArray[0];

i = 2 gives

outputArray[2] = (c * dataArray[2]) + (a * dataArray[3]) - 
    b * outputArray[1] - d * outputArray[0];

So, I think we can generalise the first loop to remove the intermediate array

outputArray[i] = (c * dataArray[i]) + (a * dataArray[i+1]) - 
    b * outputArray[i-1] - d * outputArray[i-2];

The same should also be true for the second loop. Having reviewed my maths again, I am not totally sure it is possible to combine the two loops. I'll keep thinking about this, as there may be a way to do it. Hopefully removing the intermediate storage should help.

Upvotes: 2

Related Questions