Reputation: 11
I am trying to understand the mechanics of the logarithm implementation in the GNU C Library that can be found under glibc > sysdeps > ieee754 > flt-32 > e_logf.c
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/flt-32/e_logf.c
Especially the OFF value (0x3f330000) bugs me in two ways.
int k = (asuint(x) >> 23) - 127;
Here it seems to work the other way round, but I don't understand how that'd work.
/* x = 2^k z; where z is in range [OFF,2*OFF] and exact. [...] */
And I don't understand why z (the mantissa as I understand it) would lie in that interval and not [1, 2).
There's more I don't get but maybe I can figure it out once I understand what OFF stands for. Any help is appreciated!
Upvotes: 1
Views: 191
Reputation: 26085
The code referenced in the question is very likely based on the assumption that the C type float
is mapped to the binary32
type of the IEEE-754 floating-point standard. John Bollinger's answer already addresses the bit-level manipulation of binary32
numbers used during argument reduction.
Without having looked at the code, just based on the code comments in the question it is clear that OFF
represents the lower bound of the interval over which the core approximation employed by the code is valid. Specifically, the bit pattern specified by 0x3f330000
interpreted as a binary32
number corresponds to the value 0x1.660000p-1
which is approximately 0.69921875. The upper bound is twice the lower bound, or about 1.3984375, as the cited code comment points out.
Typical implementations of log
ultimately approximate log1p
around zero, which means they approximate log
around unity. It is numerically advantageous for the lower bound of the core approximation to be >= exp(-0.5)
~= 0.60653 and the upper bound to be <= exp (0.5)
~= 1.64872, under the restriction that the upper bound is twice the lower bound. Within those constraints the particular choice of cutoff is a design parameter of the log
algorithm. It is impossible to know why the code's author made this particular choice. A reasonable guess is that this choice minimizes the overall numerical error of the implementation.
Below I am showing a simple ISO-C99 implementation of logf()
where I experimented with three different design choices for the cutoff value for precisely this reason, ultimately settling on the cutoff value of 2/3 for providing the smallest ulp error. This code also contains a portable implementation of the argument reduction using standard C library functions that I would hope make it easier to understand the bit-level manipulations.
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <float.h>
#include <math.h>
int32_t float_as_int (float a) { int32_t r; memcpy (&r, &a, sizeof r); return r; }
float int_as_float (int32_t a) { float r; memcpy (&r, &a, sizeof r); return r; }
/* natural logarithm */
float my_logf (float a)
{
float m, r, s, t, i, f;
int32_t e;
#if LOG_VARIANT == 1
// max ulp err = 0.86280
const float cutoff_f = 0.707106781f;
const int32_t cutoff_i = 0x3f3504f3;
#elif LOG_VARIANT == 2
// max ulp err = 0.88098
const float cutoff_f = 0.750000000f;
const int32_t cutoff_i = 0x3f400000;
#elif LOG_VARIANT == 3
// max ulp err = 0.85089
const float cutoff_f = 0.666666667f;
const int32_t cutoff_i = 0x3f2aaaab;
#endif // LOG_VARIANT
if ((a > 0.0f) && (a <= 0x1.fffffep+127f)) { // 3.40282347e+38f
#if PORTABLE
m = frexpf (a, &e);
if (m < cutoff_f) {
m = m + m;
e = e - 1;
}
i = (float)e;
#else // PORTABLE
i = 0.0f;
/* fix up subnormal inputs */
if (a < 0x1.0p-126f){ // 1.175494351e-38
a = a * 0x1.0p+23f; // 8388608.0
i = -23.0f;
}
e = (float_as_int (a) - cutoff_i) & 0xff800000;
m = int_as_float (float_as_int (a) - e);
i = fmaf ((float)e, 0x1.0p-23f, i); // 1.19209290e-7
#endif // PORTABLE
f = m - 1.0f;
s = f * f;
#if LOG_VARIANT == 1
/* Compute log1p(f) for f in [sqrt(0.5)-1, sqrt(2.0)-1] */
r = -0x1.384000p-4f; // -7.62329102e-2
t = 0x1.084000p-3f; // 1.29028320e-1
r = fmaf (r, s, -0x1.0f218cp-3f); // -1.32388204e-1
t = fmaf (t, s, 0x1.226b04p-3f); // 1.41805679e-1
r = fmaf (r, s, -0x1.5427a4p-3f); // -1.66091233e-1
t = fmaf (t, s, 0x1.99a35ap-3f); // 2.00018600e-1
r = fmaf (r, s, -0x1.000416p-2f); // -2.50015587e-1
r = fmaf (t, f, r);
r = fmaf (r, f, 0x1.55555cp-2f); // 3.33333433e-1
r = fmaf (r, f, -0x1.fffffap-2f); // -4.99999911e-1
#elif LOG_VARIANT == 2
/* compute log1p(f), f in [-0.25, 0.5] */
r = -0x1.738000p-5f; // -4.53491211e-2
t = 0x1.b02000p-4f; // 1.05499268e-1
r = fmaf (r, s, -0x1.0ef3e4p-3f); // -1.32301122e-1
t = fmaf (t, s, 0x1.28c48ap-3f); // 1.44906119e-1
r = fmaf (r, s, -0x1.54d148p-3f); // -1.66414797e-1
t = fmaf (t, s, 0x1.995f4ap-3f); // 1.99888781e-1
r = fmaf (r, s, -0x1.000084p-2f); // -2.50001967e-1
r = fmaf (t, f, r);
r = fmaf (r, f, 0x1.5555d0p-2f); // 3.33335161e-1
r = fmaf (r, f, -0x1.000000p-1f); // -5.00000000e-1
#elif LOG_VARIANT == 3
/* Compute log1p(f) for f in [-1/3, 1/3] */
r = -0x1.0ae000p-3f; // -0.130310059
t = 0x1.208000p-3f; // 0.140869141
r = fmaf (r, s, -0x1.f1988ap-4f); // -0.121483363
t = fmaf (t, s, 0x1.1e5740p-3f); // 0.139814854
r = fmaf (r, s, -0x1.55b36ep-3f); // -0.166846141
t = fmaf (t, s, 0x1.99d8b2p-3f); // 0.200120345
r = fmaf (r, s, -0x1.fffe02p-3f); // -0.249996200
r = fmaf (t, f, r);
r = fmaf (r, f, 0x1.5554fap-2f); // 0.333331972
r = fmaf (r, f, -0x1.000000p-1f); // -0.500000000
#endif // LOG_VARIANT
r = fmaf (r, s, f);
r = fmaf (i, 0x1.62e430p-01f, r); // 6.93147182e-1 // log(2)
} else {
r = a + a; // silence NaNs if necessary
if (a < 0.0f) r = 0.0f/0.0f; // QNaN INDEFINITE
if (a == 0.0f) r = -INFINITY; // -INF
}
return r;
}
Upvotes: 1
Reputation: 180048
Taking the second question first,
- Moreover, there's this comment:
/* x = 2^k z; where z is in range [OFF,2*OFF] and exact. [...] */
And I don't understand why z (the mantissa as I understand it) would lie in that interval and not [1, 2).
With respect to that formula, interpreting the place values of the bits of z
's binary representation such that z
always falls in any particular non-empty interval is just a question how you choose k
. Thus, that z
lies in the particular interval specified in the comment can be taken as a definition. Without such a definition, there are infinitely many pairs of z
and k
that satisfy the formula.
How is [
OFF
] used to get the exponent k? I know the floating-point as integer interpretation and then shifting bits and subtracting the bias way, e.g. for a (positive) float x = 2^k * zint k = (asuint(x) >> 23) - 127;
Here it seems to work the other way round, but I don't understand how that'd work.
You can shift then subtract or subtract a different, shifted value then shift. You can convert back and forth between the two forms algebraically. It's just distributing a multiplicative operation over an additive one vs. factoring it out.
The value defined for OFF
is equal to ((126 << 23) + 0x330000)
, which is strictly between 126 << 23
and 127 << 23
. The code you linked does this:
tmp = ix - OFF;
[...]
k = (int32_t) tmp >> 23; /* arithmetic shift */
Given the defined value of OFF
, that's the same as ...
k = (ix - (126 << 23) - 0x330000) >> 23;
. The value 0x330000 has only 22 significant bits, so the result of that computation will be either equal to (ix >> 23) - 126
(in most cases; does that look familiar?) or one less than that. It's computing the exponent based on a bias of 126.
Why 126 instead of 127? I think it's because the computation does not include the the implicit leading 1 bit in the mantissa.
The question remains what the role of the extra 0x330000 is. I'm not positive, but I am inclined to guess that it is some sort of tweak parameter.
Upvotes: 1