user1010005
user1010005

Reputation: 2707

Why is modulo operator necessary?

I've read in a document that you can replace mod operation by logical and like this :

Instead:

int Limit = Value % Range;

You do:

int Limit = Value & (Range-1);

But compilers still generate mod instructions and my question is basically : Why do compilers don't use the most efficient approach if they work the same ?

Upvotes: 0

Views: 2565

Answers (3)

std''OrgnlDave
std''OrgnlDave

Reputation: 3968

As others have stated, the range has to be 2^n-1, and even then, if it's done at run-time, you have problems.

On recent architectures (let's say, anything after P4 era) the latency on integer division instructions is between 26 and 50 or so cycles worst case. A multiply, in comparison, can be 1-3 cycles and can often be done in parallel much better.

The DIV instruction returns the quotient in EAX and the remainder in EDX. The "remainder" is free (the modulus is the remainder).

If you implement something where the range is variable at run-time, if you wish to use &, you have to:

a) check if the range is 2^n-1, if so use your & codepath: which is a branch, possible cache miss etc. etc. adding huge latency potential b) if it is not 2^n-1, use a DIV instruction

Using a DIV instead of adding a branch into the equation (which is the potential to cost hundreds or even thousands of cycles in bad cases with poor cache eviction) makes DIV the obvious best choice. On top of that, if you are using & with a signed data type, conversions will be necessary (there is no & for mixed data types but there are for DIVs). In addition if the DIV is only used to branch from the modulus and the rest of the results aren't used, speculative execution can perform nicely; also performance penalties are further mitigated by multiple pipeline that can execute instructions in parallel.

You have to remember that if you are using real code, a lot of your cache will be filled with the data you are working on, and other code and data you will be working with soon or have just worked on. You really don't want to be evicting cache pages and waiting for them to page in because of branch mispredictions. In most cases with modulo, you are not just going i = 7; d = i % 4; you're using larger code that often calls a subroutine which itself is a (predicted and cached) subroutine call directly before. In addition you're probably doing it in a loop which itself is also using branch prediction; nested branch predictions with loops are handled pretty well in modern microprocessors but it just ends up being plain stupid to add to the predicting it's trying to do.

So to summarize, using DIV makes more sense on modern processors for a general usage case; it is not really an "optimization" for a compiler to generate 2^n-1 because of cache considerations and other stuff. If you really really need to fine-tune that integer divide, and your whole program depends on it, you will end up hard-coding the divisor to 2^n-1 and making bitwise & logic yourself.

Finally, this is a bit of a rant - a dedicated ALU unit for integer divides can really reduce the latency to around 6-8 cycles, it just takes up a relatively large die area because the data path ends up being about 128 bits wide and nobody has the real estate for it when integer DIVs work just fine how they are.

Upvotes: 0

old_timer
old_timer

Reputation: 71556

you can replace modulo with that only if it is a power of 2. Using elementary math to replace it without a modulo

a = b % c;

can be done with

x = b % c;
a = b / (x*c);

Lets check this with an example

25 % 7 = 
25 / 7 = 3 (integer math)
25 - (3 * 7) =
25 - 21 = 4

Which is how I have to do it on my calculator anyway as I dont have a modulo operator.

Note that

25 & (7-6) = 
0x19 & 0x6 = 0x0

So your substitution does not work.

Not only do most processors not have a modulo, many do not have a divide. Check out the hackers delight book.

WHY would you want modulo? If you have burned the hardware to make a divide, you might be willing to go that extra mile to add modulo as well. Most processors take your question to the next level, why would you implement a divide in hardware when it can be done in software. The answer to your question is most processor families do not have a modulo, and many do not have a divide because it is not worth the chip real estate, power consumed, etc compared to the software solution. The software solution is less painful/costly/risky.

Now I assume your question is not what the winning poster answered. For cases where the Range is a power of two and the identity does work... First off if range is not known at compile time then you have to do a subtract and an and, two operations, and maybe an intermediate variable, that is much more costly than a modulo, the compiler would be in error to optimize to a subtract and and instead of a modulo. If the range is a power of two and is known at compile time your better/fancier compilers will optimize. There are times, esp with a variable word length instruction set where the smaller instruction may be used over the larger instruction, it might be less painful to load Range and do a modulo than to load the larger number of non-zero bits (values of Range that match your identity have a single bit set in the value, the other bits are zero, 0x100, 0x40, 0x8000, etc) and do the modulo. the load immediate plus modulo might be cheaper than the load immediate plus and, or the modulo immediate might be cheaper than the and immediate. You have to examine the instruction set and how the compiler has implemented the solution.

I suggest you post some examples of where it is not doing the optimization, and I assume we can post many examples of where the compiler has done the optimization you were expecting.

Upvotes: 14

Mysticial
Mysticial

Reputation: 471369

Um no... that only works when Range is a power of two.

For all other values, you still need the modulus % operator.

There are also some subtle (possibly implementation-defined) differences when working with negative numbers.


As a side note: Using the % operator is probably more readable too.

Upvotes: 26

Related Questions