Reputation: 26085
The standard C math library does not provide a function to compute the CDF of the standard normal distribution, normcdf()
. It does, however, offer closely related functions: the error function, erf()
and the complementary error function, erfc()
. The fastest way to compute the CDF is often via the error function, using the predefined constant M_SQRT1_2 to represent √½:
double normcdf (double a)
{
return 0.5 + 0.5 * erf (M_SQRT1_2 * a);
}
Clearly, this suffers from massive subtractive cancellation in the negative half-plane and is not suitable for the majority of applications. Since the cancellation issue is easily avoided by use of erfc()
, which however is typically of somewhat lower performance than erf()
, the most frequently recommended computation is:
double normcdf (double a)
{
return 0.5 * erfc (-M_SQRT1_2 * a);
}
Some testing shows that the maximum ulp error incurred in the negative half-plane is still rather large however. Using a double-precision implementation of erfc()
accurate to 0.51 ulps, one can observe errors of
up to 1705.44 ulps in normcdf()
. The issue here is that the computational error in the input to erfc()
is magnified by the exponential scaling that is inherent to erfc()
(See this answer
for an explanation of error magnification caused by exponentiation).
The following paper shows how one can achieve (nearly) correctly rounded, products when multiplying floating-point operands with arbitrary-precision constants such as √½:
Nicolas Brisebarre and Jean-Michel Muller, "Correctly rounded multiplication by arbitrary precision constants", IEEE Transactions on Computers, Vol. 57, No. 2, February 2008, pp. 165-174
The method advocated by the paper relies on the fused multiply-add operation, which is available on recent implementations of all common processor architectures, and is exposed in C via the standard math function fma()
. This leads to the following version:
double normcdf (double a)
{
double SQRT_HALF_HI = 0x1.6a09e667f3bcd0p-01; // 7.0710678118654757e-01
double SQRT_HALF_LO = -0x1.bdd3413b264560p-55; // -4.8336466567264567e-17
return 0.5 * erfc (fma (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a));
}
Tests show that this cuts the maximum error roughly in half compared to the previous version. Using the same highly-accurate erfc()
implementation as before, the maximum observed error is 842.71 ulps. This is still far away from the usual goal of providing basic mathematical functions with an error of at most a few ulps.
Is there an efficient method that allows for the accurate computation of normcdf()
, and that uses only functions available in the standard C math library?
Upvotes: 5
Views: 709
Reputation: 26085
One way around the accuracy limitations of the approaches outlined in the question is the limited use of double-double computation. This involves the computation of -sqrt (0.5) * a
as a pair of double
variables h
and l
in head/tail fashion. The high-order portion h
of the product is passed to erfc()
, while the low-order portion l
is then used to interpolate the erfc()
result, based on the local slope of the complementary error function at h
.
The derivative of erfc(x) is -2 * exp (-x * x) / √π. However, one would like to avoid the fairly expensive computation of exp(-x * x). It is known that for x > 0, erfc(x) ~= 2 * exp (-x * x) / (√π * (x + sqrt (x* x + 4/π)). Therefore, asymptotically,
erfc'(x) ~= -2 * x * erfc(x), and it follows that for |l| ≪|h|, erfc (h+l) ~= erfc (h) - 2 * h * l * erfc(h). The negation of the latter term can easily be pulled into the computation of l
. One arrives at the following implementation for double precision (using IEEE-754 binary64
):
double my_normcdf (double a)
{
double h, l, r;
const double SQRT_HALF_HI = 0x1.6a09e667f3bcd0p-01; // 7.0710678118654757e-01
const double SQRT_HALF_LO = -0x1.bdd3413b264560p-55; // -4.8336466567264567e-17
/* clamp input as normcdf(x) is either 0 or 1 asymptotically */
if (fabs (a) > 38.625) a = (a < 0.0) ? -38.625 : 38.625;
h = fma (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a);
l = fma (SQRT_HALF_LO, a, fma (SQRT_HALF_HI, a, h));
r = erfc (h);
if (h > 0.0) r = fma (2.0 * h * l, r, r);
return 0.5 * r;
}
The maximum observed error, using the same erfc()
implementation as used before, is 1.96 ulps. The corresponding single-precision implementation (using IEEE-754 binary32
) is:
float my_normcdff (float a)
{
float h, l, r;
const float SQRT_HALF_HI = 0x1.6a09e6p-01f; // 7.07106769e-1
const float SQRT_HALF_LO = 0x1.9fcef4p-27f; // 1.21016175e-8
/* clamp input as normcdf(x) is either 0 or 1 asymptotically */
if (fabsf (a) > 14.171875f) a = (a < 0.0f) ? -14.171875f : 14.171875f;
h = fmaf (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a);
l = fmaf (SQRT_HALF_LO, a, fmaf (SQRT_HALF_HI, a, h));
r = erfcf (h);
if (h > 0.0f) r = fmaf (2.0f * h * l, r, r);
return 0.5f * r;
}
Upvotes: 3