Stephen Blinkhorn
Stephen Blinkhorn

Reputation: 635

A good way to do a fast divide in C++?

Sometimes I see and have used the following variation for a fast divide in C++ with floating point numbers.

// orig loop
double y = 44100.0;
for(int i=0; i<10000; ++i) {
double z = x / y;
}

// alternative
double y = 44100;
double y_div = 1.0 / y;

for(int i=0; i<10000; ++i) {
double z = x * y_div;
}

But someone hinted recently that this might not be the most accurate way.

Any thoughts?

Upvotes: 15

Views: 22410

Answers (11)

joe
joe

Reputation: 11

here's the problem with doing it with a reciprocal, you still have to do the division before you can actually divide by Y. unless your only dividing by Y then i suppose this may be useful. this is not very practical since division is done in binary with similar algorithms.

Upvotes: 1

spoulson
spoulson

Reputation: 21601

When processing audio, I prefer to use fixed point math instead. I suppose this depends on the level of precision you need. But, let's assume that 16.16 fixed point integers will do (meaning high 16 bits is whole number, low 16 is the fraction). Now, all calculation can be done as simple integer math:

unsigned int y = 44100 << 16;
unsigned int z = x / (y >> 16);  // divisor must be the whole number portion

Or with macros to help:

#define FP_INT(x) (x << 16)
#define FP_MUL(x, y) (x * (y >> 16))
#define FP_DIV(x, y) (x / (y >> 16))

unsigned int y = FP_INT(44100);
unsigned int z = FP_MUL(x, y);

Upvotes: 2

Debilski
Debilski

Reputation: 67898

In your example using gcc the division with the options -O3 -ffast-math yields the same code as the multiplication without -ffast-math. (In a testing environment with enough volatiles around that the loop is still there.)

So if you really want to optimise those divisions away and don’t care about the consequences, that’s the way to go. Multiplication seems to be roughly 15 times faster, btw.

Upvotes: 3

jcoder
jcoder

Reputation: 30055

I presume from the original post that x is not a constant shown there but probably data from an array so x[i] is likely to be the source of the data and similarly for the output, it will be stored somewhere in memory.

I suggest that if the loop count really is 10,000 as in the original post that it will make little difference which you use as the whole loop won't even take a fraction of millisecond anyway on a modern cpu. If the loop count really is very much higher, perhaps 1,000,000 or more, then I would expect that the cost of memory access would likely make the faster operation completely irrelevent anyway as it will always be waiting for the data anyway.

I suggest trying both with your code and testing if it actually makes any significant difference in run time and if it doesn't then just write the straightforward division if that's what the algorithm needs.

Upvotes: 1

MIke V
MIke V

Reputation: 27

Audio, hunh? It's not just 44,100 divisions per second when you have, say, five tracks of audio running at once. Even a simple fader consumes cycles, after all. And that's just for a fairly bare-bones, minimal example -- what if you want to have, say, an eq and a compressor? Maybe a little reverb? Your total math budget, so to speak, gets eaten up quickly. It does make sense to wring out a little extra performance in those cases.

Profilers are good. Profilers are your friend. Profilers deserve blowjobs and pudding. But you already know where the main bottle neck is in audio work -- it's in the loop that processes samples, and the faster you can make that, the happier your users will be. Use everything you can! Multiply by reciprocals, shift bits whenever possible (exp(x*y) = exp (x)*exp(y), after all), use lookup tables, refer to variables by reference instead of values (less pushing/popping on the stack), refactor terms, and so forth. (If you're good, you'll laugh at these elementary optimizations.)

Upvotes: 1

Michael J
Michael J

Reputation: 7949

On old CPUs like the 80286, floating point maths was abysmally slow and we employed lots of trickiness to speed things up.

On modern CPUs floating point maths is blindingly fast and optimising compilers can generally do wonders with fine-tuning things.

It is almost never worth the effort to employ little micro-optimisations like that.

Try to make your code simple and idiot-proof. Only of you find a real bottleneck (using a profiler) would you think of optimisations in your floating point calculations.

Upvotes: 0

dwc
dwc

Reputation: 24938

Your original

// original loop:
double y = 44100.0;

for(int i=0; i<10000; ++i) {
    double z = x / y;
}

can easily be optimized to

// haha:
double y = 44100.0;
double z = x / y;

and the performance is pretty nice. ;-)

EDIT: People keep voting this down, so here's the not so funny version:

If there were a general way to make division faster for all cases, don't you think compiler writers might have happened upon it by now? Of course they would have done. Also, some of the people doing FPU circuits aren't exactly stupid, either.

So the only way you're going to get better performance is to know what specific situation you have at hand and doing optimal code for that. Most likely this is a complete waste of your time, because your program is slow for some other reason such as performing math on loop invariants. Go find a better algorithm instead.

Upvotes: 5

Not Sure
Not Sure

Reputation: 5963

On just about every CPU, a floating point divide is several times as expensive as a floating point multiply, so multiplying by the inverse of your divisor is a good optimization. The downside is that there is a possibility that you will lose a very small portion of accuracy on certain processors - eg, on modern x86 processors, 64-bit float operations are actually internally computed using 80 bits when using the default FPU mode, and storing it off in a variable will cause those extra precision bits to be truncated according to your FPU rounding mode (which defaults to nearest). This only really matters if you are concatenating many float operations and have to worry about the error accumulation.

Upvotes: 26

KeyserSoze
KeyserSoze

Reputation: 2511

I are looping 10,000 times simply to make the code take long enough to measure the time easily? Or do you have 10000 numbers to divide by the same number? If the former, put the "y_div = 1.0 / y;" inside the loop, because it's part of the operation.

If the latter, yes, floating point multiplication is generally faster than division. Don't change your code from the obvious to the arcane based on guesses, though. Benchmark first to find slow spots, and then optimize those (and take measurements before and after to make sure your idea actually causes an improvement)

Upvotes: 0

mwigdahl
mwigdahl

Reputation: 16588

Wikipedia agrees that this can be faster. The linked article also contains several other fast division algorithms that might be of interest.

I would guess that any industrial-strength modern compiler will make that optimization for you if it is going to profit you at all.

Upvotes: 9

second
second

Reputation: 28665

multiplication is faster than division so the second method is faster. It might be slightly less accurate but unless you are doing hard core numerics the level of accuracy should be more than enough.

Upvotes: 2

Related Questions