Christos Maris
Christos Maris

Reputation: 107

Is re-declaring variables in every iterations faster than resetting them after every iteration?

So I have a question regarding performance on two different code techniques. Can you help me understanding which one is faster/better and why?

Here is the first technique:

int x, y, i;
for(i=0; i<10; i++)
{
    //do stuff with x and y
}
//reset x and y to zero
x=0; 
y=0;

And here is the second one:

int i;
for(i=0; i<10; i++)
{
    int x, y;
    //do the same stuff with x and y as above
}

So which coding technique is better? Also if you know a better one and/or any site/article etc. where I can read about this and more performance related stuff I would love to have that also!

Upvotes: 1

Views: 1133

Answers (5)

Jerry Coffin
Jerry Coffin

Reputation: 490713

In the specific case of int variables, it makes little (or no) difference.

For variables of more complex types, especially something with a constructor that (for example) allocates some memory dynamically, re-creating the variable every iteration of a loop may be substantially slower than re-initializing it instead. For example:

#include <vector>
#include <chrono>
#include <numeric>
#include <iostream>

unsigned long long versionA() {
    std::vector<int> x;
    unsigned long long total = 0;

    for (int j = 0; j < 1000; j++) {
        x.clear();
        for (int i = 0; i < 1000; i++)
            x.push_back(i);
        total += std::accumulate(x.begin(), x.end(), 0ULL);
    }
    return total;
}

unsigned long long versionB() {
    unsigned long long total = 0;

    for (int j = 0; j < 1000; j++) {
        std::vector<int> x;
        for (int i = 0; i < 1000; i++)
            x.push_back(i);
        total += std::accumulate(x.begin(), x.end(), 0ULL);
    }
    return total;
}

template <class F>
void timer(F f) {
    using namespace std::chrono;

    auto start = high_resolution_clock::now();
    auto result = f();
    auto stop = high_resolution_clock::now();

    std::cout << "Result: " << result << "\n";
    std::cout << "Time:   " << duration_cast<microseconds>(stop - start).count() << "\n";
}

int main() {
    timer(versionA);
    timer(versionB);
}

At least when I run it, there's a fairly substantial difference between the two methods:

Result: 499500000
Time:   5114
Result: 499500000
Time:   13196

In this case, creating a new vector every iteration takes more than twice as long as clearing an existing vector every iteration instead.

For what it's worth, there are probably two separate factors contributing to the speed difference:

  1. initial creation of the vector.
  2. Re-allocating memory as elements are added to the vector.

When we clear() a vector, that removes the existing elements, but retains the memory that's currently allocated, so in a case like this were we use the same size every iteration of the outer loop, the version that just resets the vector doesn't need to allocate any memory on subsequent iterations. If we add x.reserve(1000); immediately after defining the vector in vesionA, the difference shrinks substantially (at least in my testing not quite tied in speed, but pretty close).

Upvotes: 0

tniles
tniles

Reputation: 327

Best Practices

So which coding technique is better?

As others have pointed out, given a sufficiently mature/modern compiler the performance aspect will likely be null due to optimization. Instead, the preferred code is determined by virtue of sets of ideas known as best practices.

Limiting Scope

"Scope" describes the range of access in your code. Assuming the intended scope is to be limited to within the loop itself, x and y should be declared inside the loop as the compiler will prevent you from using them later on in your function. However, in your OP you show them being reset, which implies they will be used again later for other purposes. In this case, you must declare them towards the top (e.g. outside the loop) so you can use them later.

Here's some code you can use to demonstrate the limiting of the scope:

#include <stdio.h>

#define IS_SCOPE_LIMITED

int main ( void )
{
  int i;

#ifndef IS_SCOPE_LIMITED
  int x, y;                 // compiler will not complain, scope is generous
#endif

  for(i=0; i<10; i++)
  {
#ifdef IS_SCOPE_LIMITED
    int x, y;              // compiler will complain about use outside of loop
#endif
    x = i;
    y = x+1;
    y++;
  }

  printf("X is %d and Y is %d\n", x, y);
}

To test the scope, comment out the #define towards the top. Compile with gcc -Wall loopVars.c -o loopVars and run with ./loopVars.

Benchmarking and Profiling

If you're still concerned about performance, possibly because you have some obscure operations involving these variables, then test, test, and test again! (try benchmarking or profiling your code). Again, with optimizations you probably won't find significant (if any) differences because the compiler will have done all this (allocation of variable space) prior to runtime.

UPDATE

To demonstrate this another way, you could remove the #ifdef and the #ifndef from the code (also removing each #endif), and add a line immediately preceding the printf such as x=2; y=3;. What you will find is the code will compile and run but the output will be "X is 2 and Y is 3". This is legal because the two scopes prevent the identically-named variables from competing with each other. Of course, this is a bad idea because you now have multiple variables within the same piece of code with identical names and with more complex code this will not be as easy to read and maintain.

Upvotes: 0

Samuel Peter
Samuel Peter

Reputation: 4164

It does not matter at all, because compilers don't automatically translate variable declaration to memory or register allocation. The difference between the two samples is that in the first case the variables are visible outside of the loop body, and in the second case they are not. However this difference is at the C level only, and if you don't use the variables outside the loop it will result in the same compiled code.

The compiler has two options for where to store a local variable : it's either on the stack or in a register. For each variable you use in your program, the compiler has to choose where it is going to live. If on the stack, then it needs to decrement the stack pointer to make room for the variable. But this decrementation will not happen at the place of variable declaration, typically it will be done at the beginning of the function : the stack pointer will be decremented only once by an amount sufficient to hold all of the stack-allocated variables. If it's only going to be in a register, no initialization needs to be done and the register will be used as destination when you first do an assignment. The important thing is that it can and will re-use memory locations and registers that were previously used for variables which are now out of scope.

For illustration, I made two test programs. I used 10000 iterations instead of 10 because otherwise the compiler would unroll the loop at high optimization levels. The programs use rand to make for a quick and portable demo, but it should not be used in production code.

declare_once.c :

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

int main(void) {
    srand(time(NULL));

    int x, y, i;
    for (i = 0; i < 10000; i++) {
        x = rand();
        y = rand();
        printf("Got %d and %d !\n", x, y);
    }

    return 0;
}

redeclare.c is the same except for the loop which is :

for (i = 0; i < 10000; i++) {
    int x, y;
    x = rand();
    y = rand();
    printf("Got %d and %d !\n", x, y);
}

I compiled the programs using Apple's LLVM version 7.3.0 on x86_64 Mac. I asked it for assembly output which I reproduced below, leaving out the parts unrelated to the question.

clang -O0 -S declare_once.c -o declare_once.S :

_main:
## Function prologue
    pushq   %rbp
    movq    %rsp, %rbp           ## Move the old value of the stack 
                                 ## pointer (%rsp) to the base pointer 
                                 ## (%rbp), which will be used to 
                                 ## address stack variables

    subq    $32, %rsp            ## Decrement the stack pointer by 32 
                                 ## to make room for up to 32 bytes 
                                 ## worth of stack variables including 
                                 ## x and y

## Removed code that calls srand

    movl    $0, -16(%rbp)        ## i = 0. i has been assigned to the 4 
                                 ## bytes starting at address -16(%rbp),
                                 ## which means 16 less than the base  
                                 ## pointer (so here, 16 more than the 
                                 ## stack pointer).

LBB0_1:                                 
    cmpl    $10, -16(%rbp)
    jge LBB0_4
    callq   _rand                ## Call rand. The return value will be in %eax

    movl    %eax, -8(%rbp)       ## Assign the return value of rand to x. 
                                 ## x has been assigned to the 4 bytes
                                 ## starting at -8(%rbp)
    callq   _rand
    leaq    L_.str(%rip), %rdi
    movl    %eax, -12(%rbp)      ## Assign the return value of rand to y. 
                                 ## y has been assigned to the 4 bytes
                                 ## starting at -12(%rbp)
    movl    -8(%rbp), %esi
    movl    -12(%rbp), %edx
    movb    $0, %al
    callq   _printf
    movl    %eax, -20(%rbp)
    movl    -16(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -16(%rbp)
    jmp LBB0_1
LBB0_4:
    xorl    %eax, %eax
    addq    $32, %rsp            ## Add 32 to the stack pointer : 
                                 ## deallocate all stack variables 
                                 ## including x and y
    popq    %rbp
    retq

The assembly output for redeclare.c is almost exactly the same, except that for some reason x and y get assigned to -16(%rbp) and -12(%rbp) respectively, and i gets assigned to -8(%rbp). I copy-pasted only the loop :

    movl    $0, -16(%rbp)
LBB0_1:
    cmpl    $10, -16(%rbp)
    jge LBB0_4
    callq   _rand
    movl    %eax, -8(%rbp)        ## x = rand();
    callq   _rand
    leaq    L_.str(%rip), %rdi
    movl    %eax, -12(%rbp)       ## y = rand();
    movl    -8(%rbp), %esi
    movl    -12(%rbp), %edx
    movb    $0, %al
    callq   _printf
    movl    %eax, -20(%rbp)
    movl    -16(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -16(%rbp)
    jmp LBB0_1

So we see that even at -O0 the generated code is the same. The important thing to note is that the same memory locations are reused for x and y in each loop iteration, even though they are separate variables at each iteration from the C language point of view.

At -O3 the variables are kept in registers, and both programs output the exact same assembly.

clang -O3 -S declare_once.c -o declare_once.S :

    movl    $10000, %ebx       ## i will be in %ebx. The compiler decided
                               ## to count down from 10000 because 
                               ## comparisons to 0 are less expensive,
                               ## so it actually does i = 10000.
    leaq    L_.str(%rip), %r14
    .align  4, 0x90
LBB0_1:
    callq   _rand
    movl    %eax, %r15d        ## x = rand(). x has been assigned to
                               ## register %r15d (32 less significant
                               ## bits of r15)
    callq   _rand
    movl    %eax, %ecx         ## y = rand(). y has been assigned to
                               ## register %ecx
    xorl    %eax, %eax
    movq    %r14, %rdi
    movl    %r15d, %esi
    movl    %ecx, %edx
    callq   _printf
    decl    %ebx
    jne LBB0_1

So again, no differences between the two versions, and even though in redeclare.c we have different variables at each iteration, the same registers are re-used so that there is no allocation overhead.

Keep in mind that everything I said applies to variables that are assigned in each loop iteration, which seems to be what you were thinking. If on the other hand you want to use the same values for all iterations, of course the assignment should be done before the loop.

Upvotes: 1

Jonathon Reinhart
Jonathon Reinhart

Reputation: 137547

Declaring the variables in the inner-most scope where you'll use them:

int i;
for(i=0; i<10; i++)
{
    int x, y;
    //do the same stuff with x and y as above
}

is always going to be preferred. The biggest improvement is that you've limited the scope of the x and y variables. This prevents you from accidentally using them where you didn't intend to.

Even if you use "the same" variables again:

int i;
for(i=0; i<10; i++)
{
    int x, y;
    //do the same stuff with x and y as above
}

for(i=0; i<10; i++)
{
    int x, y;
    //do the same stuff with x and y as above
}

there will be no performance impact whatsoever. The statement int x, y has practically no effect at runtime.

Most modern compilers will calculate the total size of all local variables, and emit code to reserve the space on the stack (e.g. sub esp, 90h) once in the function prologue. The space for these variables will almost certainly be re-used from one "version" of x to the next. It's purely a lexical construct that the compiler uses to keep you from using that "space" on the stack where you didn't intend to.

Upvotes: 1

sabbahillel
sabbahillel

Reputation: 4425

It should not matter because you need to initialize the variables in either case. Additionally, the first case sets x and y after they are no longer being used. As a result, the reset is not needed.

Here is the first technique:

int x=0, y=0, i;
for(i=0; i<10; i++)
{
    //do stuff with x and y
    // x and y stay at the value they get set to during the pass
}
// x and y need to be reset if you want to use them again.
// or would retain whatever they became during the last pass.

If you had wanted x and y to be reset to 0 inside the loop, then you would need to say

Here is the first technique:

int x, y, i;
for(i=0; i<10; i++)
{
    //reset x and y to zero
    x=0; 
    y=0;
    //do stuff with x and y
    // Now x and y get reset before the next pass
}

The second procedure makes x and y local in scope so they are dropped at the end of the last pass. The values retain whatever they were set for during each pass for the next pass. The compiler will actually set up the variables and initialize them them at compile time not at run time. Thus you will not be defining (and initializing) the variable for each pass through the loop.

And here is the second one:

int i;
for(i=0; i<10; i++)
{
    int x=0, y=0;
    //do the same stuff with x and y as above
    // Usually x and y only saet to 0 at start of first pass.
}

Upvotes: 0

Related Questions