Momergil
Momergil

Reputation: 2281

Faster way to copy C array with calculation between

I want to copy an C array data to another, but with a calculation between (i.e. not just copying the same content from one to another, but having a modification in the data):

int aaa;
int src[ARRAY_SIZE];
int dest[ARRAY_SIZE];

//fill src with data

for (aaa = 0; aaa < ARRAY_SIZE; aaa++)
{
    dest[aaa] = src[aaa] * 30;
}

This is done in buffers of size 520 or higher, so the for loop is considerable.

Is there any way to improve performance here in what comes to coding?

I did some research on the topic, but I couldn't find anything specific about this case, only about simple copy buffer to buffer (examples: here, here and here).

Environment: GCC for ARM using Embedded Linux. The specific code above, though, is used inside a C project running inside a dedicated processor for DSP calculations. The general processor is an OMAP L138 (the DSP processor is included in the L138).

Upvotes: 0

Views: 276

Answers (1)

Clifford
Clifford

Reputation: 93476

You could try techniques such as loop-unrolling or duff's device, but if you switch on compiler optimisation it will probably do that for you in any case if it is advantageous without making your code unreadable.

The advantage of relying on compiler optimisation is that it is architecture specific; a source-level technique that works on one target may not work so well on another, but compiler generated optimisations will be specific to the target. For example there is no way to code specifically for SIMD instructions in C, but the compiler may generate code to take advantage of them, and to do that, it is best to keep the code simple and direct so that the compiler can spot the idiom. Writing weird code to "hand optimise" can defeat the optimizer and stop it doing its job.

Another possibility that may be advantageous on some targets (if you are only ever coding for desktop x86 targets, this may be irrelevant), is to avoid the multiply instruction by using shifts:

Given that x * 30 is equivalent to x * 32 - x * 2, the expression in the loop can be replaced with:

input[aaa] = (output[aaa] << 5) - (output[aaa] << 1) ;

But again the optimizer may well do that for you; it will also avoid the repeated evaluation of output[aaa], but if that were not the case, the following may be beneficial:

int i = output[aaa] ;
input[aaa] = (i << 5) - (i << 1) ;

The shifting technique is likely to be more advantageous for division operations which are far more expensive on most targets, and it is only applicable to constants.

These techniques are likely to improve the performance of unoptimized code, but compiler optimisations will likely do far better, and the original code may optimise better than "hand optimised" code.

In the end if it is important, you have to experiment and perform timing tests or profiling.

Upvotes: 2

Related Questions