Reputation: 33817

Do optimization like this make sense?

You have two arrays and a function that counts differences between them:

for( i = 0; i < len; ++i ) { 
    int value1 = vector1[i];
    int value2 = vector2[i];
    if( value1 != value2 ) ++num_differences;
}

As branching downgrades performance, it can be optimized to:

for( i = 0; i < len; ++i ) {
    num_differences += !!(vector1[i] != vector2[i])
}
// !!(..) is to be sure that the result is boolean 0 or 1

so there is no if clause. But does it practically make sense? With GCC (and other compilers) being so smart, does it make sense to play with such optimizations?

Upvotes: 2

Answers (4)

old_timer

Reputation: 71506

I dont think you are going to do much better, your second example is hard to read/understand for the average programmer which means two things one hard to understand and maintain, two you may be creeping into dark, less tested/supported, corners of the compiler. Drive down the road between the lines, dont wander about on the shoulder or in the wrong lane.

Go with this

for( i = 0; i < len; ++i ) { 
    int value1 = vector1[i];
    int value2 = vector2[i];
    if( value1 != value2 ) ++num_differences;
}

or this

for( i = 0; i < len; ++i ) { 
    if( vector1[i] != vector2[i] ) ++num_differences;
}

if it really is bothering you and you have properly concluded this is your performance bottleneck then time the difference between them. From the disassembly shown, and the nature of this platform, it is very difficult to properly time such things and draw the right conclusions. Too many caches, and other factors that cloud over the results, leading to false conclusions, etc. and no two x86 implementations have the same performance so if you happen to tune for your computer you are likely detuning it for another model of x86 or even the same make on a different motherboard with different I/O characteristics.

Upvotes: 1

Luchian Grigore

Reputation: 258548

Unless len is several millions large or you're comparing a lot of arrays, then no. The second version is less readable (not so much to an experienced programmer), so I'd prefer the first variant, unless this is the bottleneck (doubtful).

The following codes are generated, with optimizations:

   for( i = 0; i < 4; ++i ) { 
    int value1 = vector1[i];
    int value2 = vector2[i];
    if( value1 != value2 ) ++num_differences;
00401000  mov         ecx,dword ptr [vector1 (40301Ch)] 
00401006  xor         eax,eax 
00401008  cmp         ecx,dword ptr [vector2 (40302Ch)] 
0040100E  je          wmain+15h (401015h) 
00401010  mov         eax,1 
00401015  mov         edx,dword ptr [vector1+4 (403020h)] 
0040101B  cmp         edx,dword ptr [vector2+4 (403030h)] 
00401021  je          wmain+26h (401026h) 
00401023  add         eax,1 
00401026  mov         ecx,dword ptr [vector1+8 (403024h)] 
0040102C  cmp         ecx,dword ptr [vector2+8 (403034h)] 
00401032  je          wmain+37h (401037h) 
00401034  add         eax,1 
00401037  mov         edx,dword ptr [vector1+0Ch (403028h)] 
0040103D  cmp         edx,dword ptr [vector2+0Ch (403038h)] 
00401043  je          wmain+48h (401048h) 
00401045  add         eax,1 
   }

   for( i = 0; i < 4; ++i ) {
    num_differences += !!(vector1[i] != vector2[i]);
00401064  mov         edx,dword ptr [vector1+0Ch (403028h)] 
0040106A  xor         eax,eax 
0040106C  cmp         edx,dword ptr [vector2+0Ch (403038h)] 
00401072  mov         edx,dword ptr [vector1+8 (403024h)] 
00401078  setne       al   
0040107B  xor         ecx,ecx 
0040107D  cmp         edx,dword ptr [vector2+8 (403034h)] 
00401083  mov         edx,dword ptr [vector1+4 (403020h)] 
00401089  setne       cl   
0040108C  add         eax,ecx 
0040108E  xor         ecx,ecx 
00401090  cmp         edx,dword ptr [vector2+4 (403030h)] 
00401096  mov         edx,dword ptr [vector1 (40301Ch)] 
0040109C  setne       cl   
0040109F  add         eax,ecx 
004010A1  xor         ecx,ecx 
004010A3  cmp         edx,dword ptr [vector2 (40302Ch)] 
004010A9  setne       cl   
004010AC  add         eax,ecx 
   }

So, actually, the second version is slightly slower (theoretically). 19 instructions for the second vs. 17 for the first.

Upvotes: 2

Andrew Cooper

Reputation: 32576

The short answer is: "Trust your Compiler".

In general you're not going to see much benefit from optimisations like this unless you're working with really huge datasets. Even then you really need to benchmark the code to see if there is any improvement.

Upvotes: 4

justin

Reputation: 104698

You should compare the code the compiler generates. It may be equivalent.

The compiler's very smart, but a good engineer can certainly improve a program's performance.

Upvotes: 1

Do optimization like this make sense?

Answers (4)

Related Questions