Reputation: 44505
Suppose I have two integer arrays a
and b
with 10 int
s per array. Is there a way I can add the contents of b[i]
to a[i]
using some "memset" or "memcopy" trick? I'm just looking for something faster than the obvious for loop w/ a[i] += b[i]
etc.
Upvotes: 2
Views: 268
Reputation: 3274
"Silly" - I think it's an excellent question!
You say "adding" not "copying" and I'm assuming x86:
void addintvector (int *dstp, const int *srcp, int nints)
{
int *endp;
endp=dst+nints;
nints=srcp-dstp; // reuse nints
while (dstp!=endp)
{
*dstp+=*(dstp+nints); // makes use of the [base+index*4] x86 addressing
dstp+=1; // some prefer ++dstp but I don't when it comes to pointers
}
}
The loop should translate into
add_label:
mov eax,[ebx+esi*4]
add [ebx],eax
add ebx,4
cmp ebx,edx
jne add_label
That's five instructions per loop: it won't get much faster than that!
It's also easy to clone into subtract, divide and multiply variants.
Some speak of using a GPU but this requires that 1. the GPU interfaces with applications and 2. your array is large enough to overcome the associated overhead.
To overcome the call/return overhead you could experiment with declaring it inline.
Edit
I just read your comment "since it's for a game on a mobile device" and I guess it's not an x86 platform and therefore probably does not have a reg+reg*scale addressing mode. If that is the case the code should be written
void addintvector (int *dstp, const int *srcp, int nints)
{
int *endp;
endp=dst+nints;
while (dstp!=endp)
{
*dstp+=*srcp;
srcp+=1;
dstp+=1;
}
}
Not knowing which architecture you're targeting but assuming RISC I guess the code will be eight instructions long instead (in "unoptimized" psuedocode):
add_label:
mov tempreg1,[srcreg]
mov tempreg2,[dstreg]
add tempreg2,tempreg1
mov [dstreg],tempreg2
add srcreg,4
add dstreg,4
cmp dstreg,endreg
jne add_label
Upvotes: 3
Reputation: 76876
An std::valarray
seems like a good choice.
#include <valarray>
#include <algorithm>
#include <iostream>
#include <iterator>
int main()
{
std::valarray<int> a(3, 10);
std::valarray<int> b(4, 10);
std::valarray<int> result = a + b;
std::copy(&result[0], &result[0] + result.size(),
std::ostream_iterator<int>(std::cout, " "));
return 0;
}
a
and b
are arrays with ten elements, 3 and 4 respectively. Adding two valarray
s performs an element-wise addition. There are many other arithmetical operations defined for valarray
s.
You would have to test if this is any faster than an explicit loop. Since valarrays
are designed for such operations, the implementation might be in some way optimized.
Upvotes: 1
Reputation: 78973
If you are willing to use "pure" C there are variadic macros in C99. Use P99 for unrolling:
#include "p99_for.h"
#define ADDIT(Y, X, I) X[I] += Y[I]
#define ADD_MORE(Y, X, N) P99_FOR(Y, N, P00_SEP, ADDIT, P99_DUPL(N, X))
A line like
ADD_MORE(A, B, 3);
Then expands to
B[0] += A[0]; B[1] += A[1]; B[2] += A[2];
Upvotes: 1
Reputation: 2516
Maybe it worth consider OpenCL. If you have a lot of vector or matrix tasks, let'm solve GPU. Take a look at sample with sum of vectors https://www.wiki.ed.ac.uk/display/ecdfwiki/OpenCL+quick+start
Upvotes: 2
Reputation: 8428
A simply addition loop will usually end up being fast enough, as the compiler will vectorize it: http://gcc.gnu.org/projects/tree-ssa/vectorization.html, outputting parallel instructions which will operate on four elements of the arrays at once.
Upvotes: 2
Reputation: 16726
Not that I know of.
Is the obvious loop slow enough that you really do need something "faster"? How could you improve on it?
Upvotes: 0