Reputation: 44505

Silly question about arrays in c and/or c++

Suppose I have two integer arrays a and b with 10 ints per array. Is there a way I can add the contents of b[i] to a[i] using some "memset" or "memcopy" trick? I'm just looking for something faster than the obvious for loop w/ a[i] += b[i] etc.

Upvotes: 2

Answers (6)

Olof Forshell

Reputation: 3274

"Silly" - I think it's an excellent question!

You say "adding" not "copying" and I'm assuming x86:

void addintvector (int *dstp, const int *srcp, int nints)
{
  int *endp;

  endp=dst+nints;
  nints=srcp-dstp;    // reuse nints

  while (dstp!=endp)
  {
    *dstp+=*(dstp+nints);    // makes use of the [base+index*4] x86 addressing
    dstp+=1;    // some prefer ++dstp but I don't when it comes to pointers
  }
}

The loop should translate into

add_label:
  mov eax,[ebx+esi*4]
  add [ebx],eax
  add ebx,4
  cmp ebx,edx
  jne add_label

That's five instructions per loop: it won't get much faster than that!

It's also easy to clone into subtract, divide and multiply variants.

Some speak of using a GPU but this requires that 1. the GPU interfaces with applications and 2. your array is large enough to overcome the associated overhead.

To overcome the call/return overhead you could experiment with declaring it inline.

Edit

I just read your comment "since it's for a game on a mobile device" and I guess it's not an x86 platform and therefore probably does not have a reg+reg*scale addressing mode. If that is the case the code should be written

void addintvector (int *dstp, const int *srcp, int nints)
{
  int *endp;

  endp=dst+nints;

  while (dstp!=endp)
  {
    *dstp+=*srcp;
    srcp+=1;
    dstp+=1;
  }
}

Not knowing which architecture you're targeting but assuming RISC I guess the code will be eight instructions long instead (in "unoptimized" psuedocode):

add_label:
  mov tempreg1,[srcreg]
  mov tempreg2,[dstreg]
  add tempreg2,tempreg1
  mov [dstreg],tempreg2
  add srcreg,4
  add dstreg,4
  cmp dstreg,endreg
  jne add_label

Upvotes: 3

Björn Pollex

Reputation: 76876

An std::valarray seems like a good choice.

#include <valarray>
#include <algorithm>
#include <iostream>
#include <iterator>

int main()
{
    std::valarray<int> a(3, 10);
    std::valarray<int> b(4, 10);

    std::valarray<int> result = a + b;

    std::copy(&result[0], &result[0] + result.size(), 
        std::ostream_iterator<int>(std::cout, " "));

    return 0;
}

a and b are arrays with ten elements, 3 and 4 respectively. Adding two valarrays performs an element-wise addition. There are many other arithmetical operations defined for valarrays.

You would have to test if this is any faster than an explicit loop. Since valarrays are designed for such operations, the implementation might be in some way optimized.

Upvotes: 1

Jens Gustedt

Reputation: 78973

If you are willing to use "pure" C there are variadic macros in C99. Use P99 for unrolling:

#include "p99_for.h"
#define ADDIT(Y, X, I) X[I] += Y[I]
#define ADD_MORE(Y, X, N) P99_FOR(Y, N, P00_SEP, ADDIT, P99_DUPL(N, X))

A line like

ADD_MORE(A, B, 3);

Then expands to

B[0] += A[0]; B[1] += A[1]; B[2] += A[2];

Upvotes: 1

Oleg Svechkarenko

Reputation: 2516

Maybe it worth consider OpenCL. If you have a lot of vector or matrix tasks, let'm solve GPU. Take a look at sample with sum of vectors https://www.wiki.ed.ac.uk/display/ecdfwiki/OpenCL+quick+start

Upvotes: 2

Jeremy Salwen

Reputation: 8428

A simply addition loop will usually end up being fast enough, as the compiler will vectorize it: http://gcc.gnu.org/projects/tree-ssa/vectorization.html, outputting parallel instructions which will operate on four elements of the arrays at once.

Upvotes: 2

Jon

Reputation: 16726

Not that I know of.

Is the obvious loop slow enough that you really do need something "faster"? How could you improve on it?

Upvotes: 0

Silly question about arrays in c and/or c++

Answers (6)

Related Questions