OJFord
OJFord

Reputation: 11140

Fastest way to reset a value in every struct element of a vector?

Very much like this question, except that instead vector<int> I have vector<struct myType>.

If I want to reset (or for that matter, set to some value) myType.myVar for every element in the vector, what's the most efficient method?

Right now I'm iterating through:

for(int i=0; i<myVec.size(); i++) myVec.at(i).myVar = 0;

But since vectors are guaranteed to be stored contiguously, there's surely a better way?

Upvotes: 8

Views: 1457

Answers (6)

Nikos Athanasiou
Nikos Athanasiou

Reputation: 31549

One of the fastest ways would be to perform loop unwinding and break the speed limit posed by conventional for loops that cause great cash spills. In your case, as it's a run time thing, there's no way to apply template metaprogramming so a variation on good old Duff's device would do the trick

#include <iostream>
#include <vector>

using namespace std;

struct myStruct {
    int a;
    double b;
};

int main() 
{
    std::vector<myStruct> mV(20);

    double val(20);            // the new value you want to reset to
    int n = (mV.size()+7) / 8; // 8 is the size of the block unroll
    auto to = mV.begin();      // 

    switch(mV.size()%8)
    {
       case 0: do { (*to++).b = val;
       case 7:      (*to++).b = val;
       case 6:      (*to++).b = val;
       case 5:      (*to++).b = val;
       case 4:      (*to++).b = val;
       case 3:      (*to++).b = val;
       case 2:      (*to++).b = val;
       case 1:      (*to++).b = val;
        } while (--n>0);
    }

    // just printing to verify that the value is set
    for (auto i : mV) std::cout << i.b << std::endl;

    return 0;
}

Here I choose to perform an 8-block unwind, to reset the value (let's say) b of a myStruct structure. The block size can be tweaked and loops are effectively unrolled. Remember this is the underlying technique in memcpy and one of the optimizations (loop unrolling in general) a compiler will attempt (actually they're quite good at this so we might as well let them do their job).

Upvotes: 2

rici
rici

Reputation: 241861

Beware of spending a lot of time thinking about optimization details which the compiler will just take care of for you.

Here are four implementations of what I understand the OP to be, along with the code generated using gcc 4.8 with --std=c++11 -O3 -S

Declarations:

#include <algorithm>
#include <vector>

struct T {
  int irrelevant;
  int relevant;
  double trailing;
};

Explicit loop implementations, roughly from answers and comments provided to OP. Both produced identical machine code, aside from labels.

                                                        .cfi_startproc
                                                        movq    (%rdi), %rsi
void clear_relevant(std::vector<T>* vecp) {             movq    8(%rdi), %rcx
  for(unsigned i=0; i<vecp->size(); i++) {              xorl    %edx, %edx
    vecp->at(i).relevant = 0;                           xorl    %eax, %eax
  }                                                     subq    %rsi, %rcx
}                                                       sarq    $4, %rcx
                                                        testq   %rcx, %rcx
                                                        je      .L1
                                                        .p2align 4,,10
                                                        .p2align 3
                                                .L5:
void clear_relevant2(std::vector<T>* vecp) {            salq    $4, %rdx
  std::vector<T>& vec = *vecp;                          addl    $1, %eax
  auto s = vec.size();                                  movl    $0, 4(%rsi,%rdx)
  for (unsigned i = 0; i < s; ++i) {                    movl    %eax, %edx
    vec[i].relevant = 0;                                cmpq    %rcx, %rdx
  }                                                     jb      .L5
}                                               .L1:
                                                        rep ret
                                                        .cfi_endproc

Two other versions, one using std::for_each and the other one using the range for syntax. Here there is a subtle difference in the code for the two versions (other than the labels):

                                                        .cfi_startproc
                                                        movq    8(%rdi), %rdx
                                                        movq    (%rdi), %rax
                                                        cmpq    %rax, %rdx
                                                        je      .L17
void clear_relevant3(std::vector<T>* vecp) {            .p2align 4,,10
  for (auto& p : *vecp) p.relevant = 0;                 .p2align 3
}                                               .L21:
                                                        movl    $0, 4(%rax)
                                                        addq    $16, %rax
                                                        cmpq    %rax, %rdx
                                                        jne     .L21
                                                .L17:
                                                        rep ret
                                                        .cfi_endproc


                                                        .cfi_startproc
                                                        movq    8(%rdi), %rdx
                                                        movq    (%rdi), %rax
                                                        cmpq    %rdx, %rax
void clear_relevant4(std::vector<T>* vecp) {            je      .L12
  std::for_each(vecp->begin(), vecp->end(),             .p2align 4,,10
                [](T& o){o.relevant=0;});               .p2align 3
}                                               .L16:
                                                        movl    $0, 4(%rax)
                                                        addq    $16, %rax
                                                        cmpq    %rax, %rdx
                                                        jne     .L16
                                                .L12:
                                                        rep ret
                                                        .cfi_endproc

Upvotes: 1

Mantosh Kumar
Mantosh Kumar

Reputation: 5741

Instead of the below code

for(int i=0; i<myVec.size(); i++) myVec.at(i).myVar = 0;

do it as follows:

size_t sz = myVec.size();
for(int i=0; i<sz; ++i) myVec[i].myVar = 0;

As the "at" method internally checks whether index is out of range or not. But as your loop index is taking care(myVec.size()), you can avoid the extra check. Otherwise this is the fastest way to do it.

EDIT

In addition to that, we can store the size() of vector prior to executing the for loop.This would ensure that there is no further call of method size() inside the for loop.

Upvotes: 6

AdUki
AdUki

Reputation: 392

Also pre increment ++i takes few instructions less than post increment i++. Explanation here

Upvotes: 1

Shoe
Shoe

Reputation: 76280

Resetting will need to traverse every element of the vector, so it will need at least O(n) complexity. Your current algorithm takes O(n).

In this particular case you can use operator[] instead of at (that might throw an exception). But I doubt that's the bottleneck of your application.

On this note you should probably use std::fill:

std::fill(myVec.begin(), myVec.end(), 0);

But unless you want to go byte level and set a chunk of memory to 0, which will not only cause you headaches but will also make you lose portability in most cases, there's nothing to improve here.

Upvotes: 7

edmz
edmz

Reputation: 8494

In addition to what's been said before, you should consider that if you turn on optimizations, the compiler will likely perform loop-unrolling which will make the loop itself faster.

Upvotes: 1

Related Questions