Reputation: 1
I am trying to implement a small library of mathematical functions for 32-bit floats (simple precision) as part of one of my java projects. When it comes to calculating the sine of very large arguments, I can't manage to implement the Payne-Hanek range-reduction algorithm.
I know there are many threads about this algorithm on stackoverflow, and I've looked at and analyzed various explanatory articles such as ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit (K. C. Ng and the members of the FP group of SunPro), so I think I understand it pretty well for double precision float, but after several days of reflection I still can't implement this algorithm in my 32-bits case. Is there a place where I could find a clear explanation of my particular case, and if not, a similar algorithm for my problem? (For the sinus of average argument x, I already use the Cody-Waite reduction, which however becomes very imprecise above 2^23 (about 10e9)).
Thanks in advance for your help, I'm still a programming novice and this is my first question on StackOverflow. I apologize for the possible mistakes in my English as it is not my first language.
EDIT :
In my implementation attempt, I've taken up the method described in ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit (https://redirect.cs.umbc.edu/~phatak/645/supl/Ng-ArgReduction.pdf), which I've tried to modify for 32-bit floats as follows (I won't re-explain the principle of the algorithm to avoid a lengthy message):
The result is a 23-bit mantissa for float x, with an exponent between -128 and 127. In the case of simple precision, I have considered that the float closest to a multiple of pi/2 y is at least 2^-31 (It might be 30 or 29 but I never explicitly calculated it). For our calculation, we must therefore store at least T bits of 2/pi with :
T = max_exponent + mantissa + bits_for_worst_reduction_case = 23 + 127 + 31 = 181.
Since these bits will be stored as int (32bits), I choose to directly store 192 of them (i.e. 6 blocks).however, I use this algorithm for floats greater than 2^23 (min_exponent = 23). Posing M the number of trailing zeros before the binary point, the only part used among all these stored bits is the bits between the M-1st and the M+bits_for_worst_reduction_case+min_exposant = M+54th
(the reason for using only these bits is very well described in the paper presented above, but quickly :
The bits preceding the M-2nd included are such that x * these bits = 4*N where N is an odd integer by definition of M.
The bits following the M+55th are such that x*these bits <= 2^32 (so it doesn't matter in our calculation, even in the worst-case scenario).
The only thing that remains is to calculate x * B. The method is explained in the article, with the difference that here, in order to obtain perfectly equal result, we need to separate x and B into x_i and b_j blocks of at most 10 bits in length, so that x_i * b_j is a 20 significant bit number and the final multiplication (which for each result block y_i requires 2 additions of x_i * b_j sub-blocks) is a n = 22 significant bit number (so that we have n<23 and those bits can fit in a float mantissa).
Now that we have the results block y_i, we can have the remainder r of x%(pi/2) with :
y_i = y_i%4
y = sum(y_i, 1<i<6)
t = (y1-y) + sum(y_i, 2<i<6)
k = round(y)
f = (y-k) + t
r = f*pi/2.
My attempt of implementation is below. There is a few mistakes :
I do not know how to compute the bits of 2OverPi=0.6366... . Should I put them as the int 6366... or as the bits of the float 0.6366...
I can't manage to compute the number of trailing zeros before the binary point of my float. My attempt is in the countTrailingZeros() method.
public static final float piOver2 = 1.57079637051f; //0x3fc90fdb;
/* 192 bits de Pi/2 for reduction : */
public static final int TwoOverPiBits[] = {
0x00000000,
0x28be60db,
0x9391054a,
0x7f09d5f4,
0x7d4d3770,
0x36d8a566,
0x4f10e410
};
// return the exponant of a float in simple precision
public int getExp(float x) {
int bits = Float.floatToIntBits(x);
int exp = ((bits >> 23) & 0xFF) - 127;
return exp;
}
// return the mantissa of a float in simple precision
public int getMant(float x) {
int bits = Float.floatToIntBits(x);
int mant = bits & 0x7fffff;
return mant;
}
// return the sign of a float in simple precision
public int getSgn(float x) {
int bits = Float.floatToIntBits(x);
int sgn = (bits >> 31) & 1;
return sgn;
}
// Method to extract 55 bits from TwoOverPiBits for the index k and stock them in 10 bits blocks
public int[] packBits(int k) {
int numPackets = 6;
int[] packets = new int[numPackets];
int bitIndex = k; // Index of first bit
for (int i = 0; i < numPackets; i++) {
packets[i] = extract12Bits(bitIndex);
bitIndex += 10;
}
return packets;
}
// Method to extract 10 bits from TwoOverPiBits at a given index
public int extract12Bits(int bitIndex) {
int intIndex = bitIndex / 32; // Index de l'entier dans le tableau
int bitOffset = bitIndex % 32; // Décalage du bit dans l'entier
int result;
if (bitOffset <= 22) {
// All bits are in the same TwoOverPiBits block
result = (deuxSurPiBits[intIndex] >> bitOffset) & 0x3FF; // Masqk to extract 10 bits
} else {
// 10 bits are in two cosecutive TwoOverPiBits blocks
int bitsFirstPart = deuxSurPiBits[intIndex] >>> bitOffset;
int bitsSecondPart = deuxSurPiBits[intIndex + 1] & ((1 << (10 - (32 - bitOffset))) - 1);
result = (bitsSecondPart << (32 - bitOffset)) | bitsFirstPart;
}
return result;
}
//Methode to separate a float in 4 10 bits blocks
public int[] extract12BitPackets(float x) {
int bits = Float.floatToIntBits(x);
int[] packets = new int[4];
// Extract the 4 blocks : 32 bits in 2-10-10-10
// First block : 10 bits but only 2 that are informative : 1 sign bit and the first exposant bit
// Second block : 7 exposant bits and 3 mantissa bits
packets[1] = (bits >> 20) & 0x3FF;
// third block : 10 mantissa bits
packets[2] = (bits >> 10) & 0x3FF;
// last block : 10 last mantissa bits
packets[3] = (bits & 0x3FF);
return packets;
}
//Method to count number of trailing zeros before the binary point
public int countTrailingZeros(int number) {
if (number == 0) return 32;
int count = 0;
while ((number & 1) == 0) {
count++;
number >>= 1;
}
return count;
}
public float payneHanek(float x) {
if (x<0){
return -payneHanek(-x);
}
int q;
float f, r;
int k = getExp(x);
int mant = getMant(x);
int M = (k - 23) + countTrailingZeros(mant);
int[] bits552Overpi = packBits(M - 6); //We begin at M-6 but we then erase the first five bits with the &0x3FF
int 2Overpibits5 = bits552Overpi[5];
int 2Overpibits4 = bits552Overpi[4];
int 2Overpibits3 = bits552Overpi[3];
int 2Overpibits2 = bits552Overpi[2];
int 2Overpibits1 = bits552Overpi[1];
int 2Overpibits0 = bits552Overpi[0] & 0x3FF;
//Separate x in 10 bits block
int[] bits32x = extract12BitPackets(x);
int xbits3 = bits32x[3];
int xbits2 = bits32x[2];
int xbits1 = bits32x[1];
int xbits0 = bits32x[0];
//x*2OverPi with the blocks
int y8 = xbits3 * 2Overpibits5;
int y7 = xbits3 * 2Overpibits4 + xbits2 * 2Overpibits5;
int y6 = xbits3 * 2Overpibits3 + xbits2 * 2Overpibits4 + xbits1 * 2Overpibits5;
int y5 = xbits3 * 2Overpibits2 + xbits2 * 2Overpibits3 + xbits1 * 2Overpibits4 + xbits0 * 2Overpibits5;
int y4 = xbits3 * 2Overpibits1 + xbits2 * 2Overpibits2 + xbits1 * 2Overpibits3 + xbits0 * 2Overpibits4;
int y3 = xbits3 * 2Overpibits0 + xbits2 * 2Overpibits1 + xbits1 * 2Overpibits2 + xbits0 * 2Overpibits3;
int y2 = xbits2 * 2Overpibits0 + xbits1 * 2Overpibits1 + xbits0 * 2Overpibits2;
int y1 = xbits1 * 2Overpibits0 + xbits0 * 2Overpibits1;
int y0 = xbits0 * 2Overpibits0;
//Modulo of each blocks : those variables are exact
y8 = y8 % 4;
y7 = y7 % 4;
y6 = y6 % 4;
y5 = y5 % 4;
y4 = y4 % 4;
y3 = y3 % 4;
y2 = y2 % 4;
y1 = y1 % 4;
y0 = y0 % 4;
//final result :
float y = y0 + (y1 + (y2 + (y3 + (y4 + (y5 + (y6 + (y7 + y8)))))));
//error :
int error = (((((((y1 - y) + y2) + y3) + y4) + y5) + y6) + y7) + y8;
q = (int) Math.rint(y);
f = (y - q) + error;
r = f * piSur2;
//Calculate sin(r)....
Upvotes: 0
Views: 165