Reputation: 2633

accurate way of getting the modulo of a long and a double in c++

I'm working on a c++ program that involves timing and in this program I have to determine the modulo of a time in miliseconds (a long) compared to a double. Normally one would cast the long as a double and use fmod or cast the double to and int/long and then use %. However this would not give me the accuracy I need either. Is there a way to easily handle this?

so what I'm looking for is this:

long a = 9999999; 
double b= 1.42 ;
double answer = a % b;  <<< how do I do this?

Upvotes: 2

Answers (4)

Patricia Shanahan

Reputation: 26185

This is more an extended comment than an answer.

Even if you switched to a data type that could represent all your numbers exactly, you would still have a precision problem due to measurement error, especially using modulo.

The problem is that you are measuring time in milliseconds. It is very unlikely that you have a one kilohertz computer. It is more likely that things are happening at much finer granularity, possibly at the nanosecond level. If you measure by taking the difference between two values of a millisecond clock, you can have up to a millisecond of measurement error. An activity that appears to take 10 ms may have actually taken anywhere from slightly over 9 ms to just under 11 ms, depending on when the start and end time events fell relative to clock ticks.

You can usually control measurement error by making sure your measurements are long compared to the tick length. A duration like 9999999 ms would have about one part in ten million of measurement error, generally not a problem, although it does dwarf double precision conversion rounding error. However, if you subsequently reduce modulo something between one and two, the result is practically meaningless.

Incidentally, for reasonable elapsed times in milliseconds, conversion from long to double is exact. You would need to be measuring thousands of years to get rounding error.

Why the modulo calculation? What is its purpose? Do you reduce modulo such small numbers in the real calculation?

Upvotes: 1

supercat

Reputation: 81307

Although there is an fmod function which will yield a precise remainder when dividing two double values, and although one could achieve a precise remainder between a long and a double by splitting the long value into two double values which sum to the original one, using fmod on each, adding the results, and adding or subtracting the divisor, such techniques would only be useful if the divisor is itself precisely representable as a double.

If the divisor is representable as a quotient of two integers (e.g. X/Y) whose product will fit in a long, a more accurate approach would be to compute (((N % X)*Y) % X) / Y. That approach will yield the double value which is closest to the numerically perfect result even if the quotient (X/Y) would not be precisely representable. Note that the first N % X could be simplified to N if N * Y will fit in a long, but the formula as given will work correctly whether or not it can.

Upvotes: 2

eerorika

Reputation: 238461

If you have the choice of using fixed point number for b, then you can get the exact value. If you want to convert the exact result (of the fixed point calculation) to a double, then the accuracy will be limited by the ability of a double to represent the result.

long a_fixed = a * 100; 
long b_fixed = 142; // b * 100
double answer = a_fixed % b_fixed / 100.0;

In the above example, a_fixed % b_fixed is the exact value for (a % b * 100). a must be less than LONG_MAX / 10^2 and the precision of b can be up to 2 decimals. You can reduce the latter limitation by multiplying a and b with a higher power of 10. The former limitation can be avoided by using arbitrary precision integers. An implementation of arbitrary precision integers may even provide an interface for fixed precision arithmetic, allowing you to avoid writing the multiplications in my example and just set the appropriate epsilon.

Upvotes: 2

Pascal Cuoq

Reputation: 80355

In a comment below the question:

I would expect to receive the value 1.16 and I get a compilation error on this

You would expect 1.16 because you think you have the value 1.42. You don't. You have the double nearest to 1.42, and while it is close enough to 142/100 (it is exactly 1.4199999999999999289457264239899814128875732421875), subtracting it many times from a large number is going to make a noticeable difference in the end.

In short, there is no way to do what you want (a % b). There is an operation between double, fmod(a, b), which does what you say, but you can only use it if you understand that it applies to double values a and b, which are not represented in decimal internally.

Additional notes:

fmod is exact: the result of the mathematical operation it stands for is always representable as a double, and fmod computes exactly this mathematical result. On the other hand, other floating-point operations are not exact, including conversion from decimal in the case of the decimal representation “1.42”.

fmod(9999999.0, 1.42) is computed exactly as 1.16000000050038210019920370541512966156005859375. In this expression, 9999999.0 represents exactly the value 999999999. The error only comes from the difference between 1.42 and 142/100.

Upvotes: 10

accurate way of getting the modulo of a long and a double in c++

Answers (4)

Related Questions