Reputation: 123
I am trying to make my code more efficient by reducing the number of conditional statements as I am going to write the code in assembly in the future. Do you guys see any way to reduce the statements to be more efficient?
if (j==18)
{
store=12;
}
else if(j==30)
{
store=10;
}
else if(j==42)
{
store=8;
}
else if(j==54)
{
store=6;
}
else if(j==66)
{
store=4;
}
else if(j==78)
{
store=2;
}
else if(j==92)
{
store=2;
}
else if(j==106)
{
store=4;
}
else if(j==120)
{
store=6;
}
else if(j==134)
{
store=8;
}
else if(j==148)
{
store=10;
}
else if(j==162)
{
store=12;
}
else store=1;
Upvotes: 2
Views: 2867
Reputation: 16606
if (j < 18 || 162 < j) {
store = 1;
} else if (j < 90) {
int mod12 = (j-6) % 12;
// ((j-6)/12) -> 18=>1, .. 78=>6 (/6 used to get *2)
store = (mod12) ? 1 : 14 - ((j-6) / 6);
} else {
int mod14 = (j-8) % 14;
// ((j-8)/14) -> 92=>6, ... 162=>11 (/7 used to get *2)
store = (mod14) ? 1 : ((j-8) / 7) - 10;
}
Which can be implemented straightforwardly enough by manual assembler, although modern C++ compiler will do better than ordinary human, for example gcc 7.3 produces something a bit better than I had originally on my mind (and something I wouldn't want to write by hand).
Actually the gcc can be hand-holded a bit to understand that formula better.
Modified source:
if (j < 18 || 162 < j) {
store = 1;
} else if (j < 90) {
int mod12 = (j-6) % 12;
// ((j-6)/12) -> 18=>1, .. 78=>6
store = (mod12) ? 1 : 14 - 2*((j-6) / 12);
} else {
int mod14 = (j-8) % 14;
// ((j-8)/14) -> 92=>6, ... 162=>11
store = (mod14) ? 1 : 2*((j-8) / 14) - 10;
}
And for completeness, here is switch
version (didn't benchmark it, but should be quite slower than the code above): https://godbolt.org/g/ELNCYD
Looks like the gcc was unable to figure it out and does use many "ifs" for that one.
NEW: so after checking all those compilers/and comments, this looks to be most [performance] optimal x86_64 solution (subroutine taking "j" in edi
and returning "store" in rax
) (NASM syntax):
; input: edi = j, output: rax = store
storeByJ:
sub edi, 18
cmp edi, (162-18)
ja .JoutOfRange ; j < 18 || 162 < j -> return 1
; rdi = 0 .. 162-18 = 144
movzx eax, byte [rel .JtoStoreLUT + rdi]
ret
.JoutOfRange:
mov eax,1
ret
.JtoStoreLUT:
; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
db 12, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 18->12
db 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 30->10
db 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 42->8
db 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 54->6
db 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 66->4
db 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 78->2
db 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; 92->2
db 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ;106->4
db 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ;120->6
db 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ;134->8
db 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ;148->10
db 12 ;162->12
If you would have much larger range (more values) being still +12 and after some point +14 differentiated, the formula may get better (by saving the cache memory from the too-big LUT (Look-Up-Table)), but at this moment this code is about 170B long even with the LUT data, so it will very likely fit even with the whole loop using it.
EDIT: one more variant, with half-sized LUT, but using ror
to filter out odd values by single jump in the range test (I'm not sure about performance, but as with any other code-efficient questions, profiling is absolutely essential, above all the theoretical reasoning, if you can't confirm your theory in benchmark, fix your benchmark (lot more likely), or find out surprising intricacies of the CPU internals and how you misunderstood something... (but this is still likely enough to happen often)).. and the range branch is eliminated by using cmovCC
(total 97B):
; input: edi = j, output: eax = store
storeByJ:
mov eax, 1
sub edi, 18
ror edi, 1 ; /2 but keeps "odd" bit in the edi
; to make it fail range check on next line
cmp edi, (162-18)/2
cmova edi, eax ; j < 18 || 162 < j || j&1 -> return 1 (from LUT[1])
; rdi = 0 .. (162-18)/2 = 72 # rdi = (j-18)/2
movzx eax, byte [rel .JtoStoreLUT + rdi]
ret
.JtoStoreLUT:
; 0, 1, 2, 3, 4, 5, 6
db 12, 1, 1, 1, 1, 1 ; 18->12
db 10, 1, 1, 1, 1, 1 ; 30->10
db 8, 1, 1, 1, 1, 1 ; 42->8
db 6, 1, 1, 1, 1, 1 ; 54->6
db 4, 1, 1, 1, 1, 1 ; 66->4
db 2, 1, 1, 1, 1, 1, 1 ; 78->2
db 2, 1, 1, 1, 1, 1, 1 ; 92->2
db 4, 1, 1, 1, 1, 1, 1 ; 106->2
db 6, 1, 1, 1, 1, 1, 1 ; 120->2
db 8, 1, 1, 1, 1, 1, 1 ; 134->2
db 10, 1, 1, 1, 1, 1, 1 ; 148->2
db 12 ; 162->2
Upvotes: 5
Reputation: 140
I would try to use an array 162 char or even less if those are all the j you must deal with. It sounds as a memory waste but consider all those instruction saved, they occupy memory too. M2c
Upvotes: 3
Reputation: 2113
Well, it would definitely be faster (and more compact) to use a switch statement instead of a series of if statements. Most languages can optimize a switch statement much better than a series of ifs. Your code would look something like this in switch statement form:
switch(j) {
case 18:
store = 12;
break;
case 30:
store = 10;
break;
// and so on
default:
store = 1;
}
Of course, this won't be as fast as a version of your code without conditionals (probably). If you could come up with a mathematical formula to calculate the value of store
in terms of j
, that would (probably) be a lot faster.
Upvotes: 4