Split range into uniform intervals

Question

I want to split a range with double borders into N>=2 equal or near equal intervals.

I found a suitable function in GNU Scientific Library:

make_uniform (double range[], size_t n, double xmin, double xmax)
{
  size_t i;

  for (i = 0; i <= n; i++)
    {
      double f1 = ((double) (n-i) / (double) n);
      double f2 = ((double) i / (double) n);
      range[i] = f1 * xmin +  f2 * xmax;
    }
}

However, when
xmin = 241141 (binary 0x410D6FA800000000)
xmax = 241141.0000000001 (binary 0x410D6FA800000003)
N = 3
the function produces

[0x410D6FA800000000,
 0x410D6FA800000000,
 0x410D6FA800000002,
 0x410D6FA800000003]

instead of desired

[0x410D6FA800000000,
 0x410D6FA800000001,
 0x410D6FA800000002,
 0x410D6FA800000003]

How achieve uniformity without resorting to long arithmetics (i already have a long arithmetics solution but it is ugly and slow)? Bit twiddling and x86 (x86-64, so no extended precision) assembler routines are acceptable.

UPDATE:

General solution is needed, without premise that xmin, xmax have equal exponent and sign:

xmin and xmax may be of any value except infinity and NaN (possibly also excluding denormalized values for sake of simplicity).
xmin < xmax
(1<<11)-1>=N>=2
i'm ready for major (in 2-3 orders) performance loss

Peter Cordes · Accepted Answer

x87 still exists in x86-64, and 64-bit kernels for mainstream OSes do correctly save/restore the x87 state for 64-bit processes. Despite what you may have read, x87 is fully usable in 64-bit code.

Outside of Windows (i.e. the x86-64 System V ABI used everywhere else), long double is the 80-bit native x87 native format. This will probably solve your precision problem for x86 / x86-64 only, if you don't care about portability to ARM / PowerPC / whatever else that only has 64-bit precision in HW.

Probably best to only use long double for temporaries inside the function.

I'm not sure what you have to do on Windows to get a compiler to emit 80-bit extended FP math. It's certainly possible in asm, and supported by the kernel, but the toolchain and ABI make it inconvenient to use.

x87 is only somewhat slower than scalar SSE math on current CPUs. 80-bit load/store is extra slow, though, like 4 uops on Skylake instead of 1 (https://agner.org/optimize/) and a few cycles extra latency for fld m80.

For your loop having to convert int to FP by storing and using x87 fild, it might be something like at most a factor of 2 slower than what a good compiler could do with SSE2 for 64-bit double.

And of course long double will prevent auto-vectorization.

Split range into uniform intervals

Answers (2)

Related Questions