Reputation: 913
I want to split a range with double
borders into N>=2
equal or near equal intervals.
I found a suitable function in GNU Scientific Library:
make_uniform (double range[], size_t n, double xmin, double xmax)
{
size_t i;
for (i = 0; i <= n; i++)
{
double f1 = ((double) (n-i) / (double) n);
double f2 = ((double) i / (double) n);
range[i] = f1 * xmin + f2 * xmax;
}
}
However, when
xmin = 241141
(binary 0x410D6FA800000000
)
xmax = 241141.0000000001
(binary 0x410D6FA800000003
)
N = 3
the function produces
[0x410D6FA800000000,
0x410D6FA800000000,
0x410D6FA800000002,
0x410D6FA800000003]
instead of desired
[0x410D6FA800000000,
0x410D6FA800000001,
0x410D6FA800000002,
0x410D6FA800000003]
How achieve uniformity without resorting to long arithmetics (i already have a long arithmetics solution but it is ugly and slow)? Bit twiddling and x86 (x86-64, so no extended precision) assembler routines are acceptable.
UPDATE:
General solution is needed, without premise that xmin
, xmax
have equal exponent and sign:
xmin
and xmax
may be of any value except infinity and NaN (possibly also excluding denormalized values for sake of simplicity).xmin < xmax
(1<<11)-1>=N>=2
Upvotes: 4
Views: 614
Reputation: 364210
x87 still exists in x86-64, and 64-bit kernels for mainstream OSes do correctly save/restore the x87 state for 64-bit processes. Despite what you may have read, x87 is fully usable in 64-bit code.
Outside of Windows (i.e. the x86-64 System V ABI used everywhere else), long double
is the 80-bit native x87 native format. This will probably solve your precision problem for x86 / x86-64 only, if you don't care about portability to ARM / PowerPC / whatever else that only has 64-bit precision in HW.
Probably best to only use long double
for temporaries inside the function.
I'm not sure what you have to do on Windows to get a compiler to emit 80-bit extended FP math. It's certainly possible in asm, and supported by the kernel, but the toolchain and ABI make it inconvenient to use.
x87 is only somewhat slower than scalar SSE math on current CPUs. 80-bit load/store is extra slow, though, like 4 uops on Skylake instead of 1 (https://agner.org/optimize/) and a few cycles extra latency for fld m80
.
For your loop having to convert int to FP by storing and using x87 fild
, it might be something like at most a factor of 2 slower than what a good compiler could do with SSE2 for 64-bit double.
And of course long double
will prevent auto-vectorization.
Upvotes: 1
Reputation: 13269
I see two choices: reordering the operations as xmin + (i * (xmax - xmin)) / n
, or dealing directly with the binary representations. Here is a example for both.
#include <iostream>
#include <iomanip>
int main() {
double xmin = 241141;
double xmax = 241141.0000000001;
size_t n = 3, i;
double range[4];
std::cout << std::setprecision(std::numeric_limits<double>::digits10) << std::fixed;
for (i = 0; i <= n; i++) {
range[i] = xmin + (i * (xmax - xmin)) / n;
std::cout << range[i] << "\n";
}
std::cout << "\n";
auto uxmin = reinterpret_cast<unsigned long long&>(xmin);
auto uxmax = reinterpret_cast<unsigned long long&>(xmax);
for (i = 0; i <= n; i++) {
auto rangei = ((n-i) * uxmin + i * uxmax) / n;
range[i] = reinterpret_cast<double&>(rangei);
std::cout << range[i] << "\n";
}
}
Upvotes: 0