Desperados
Desperados

Reputation: 434

IEEE754 single precision - General algorithm for representing the half of a number

Suppose N is an arbitrary number represented according to IEEE754 single precision standards. I want to find the most precise possible representation of N/2 again in IEEE754.

I want to find a general algorithm (described in words, I just want the necessary steps and cases to take into account) for obtaining the representation.

My approach is :

Say the number is represented: b0_b1_b2_b3...b_34.

  1. Isolate the first bit which determines the sign (-/+) of the number.
  2. Calculate the representation of the power (p) from the unsigned representation b_1...b_11.
  3. If power = 128 we have a special case. If all the bits of the mantissa are equal to 0, we have, depending on b_0, either minus or plus infinity. We don't change anything. If the mantissa has at least one bit equal to 1 then we have NaN value. Again we change nothing.
  4. if e is inside]-126, 127[then we have a normalized mantissam. The new power p can be calculated asp' = p - 1and belongs in the interval]-127, 126]. We then calculatem/2` and we represent it starting from the right and losing any bits that cannot be included in the 23 bits of the mantissa.
  5. If e = -126, then in calculating the half of this number we pass in a denormalized mantissa. We represent p = 127, calculate the half of the mantissa and represent it again starting from the right losing any information that cannot be included.
  6. Finally if e = -127 we have a denormalized mantissa. As long as m/2 can be represented in the number of bits available in the mantissa without losing information we represent that and keep p = -127. In any other case we represent the number as a positive or negative 0 depending on b_0

Any steps I have missed, any improvements ( I am sure there are ) that can be made or anything that seems completely wrong?

Upvotes: 4

Views: 136

Answers (1)

David Eisenstat
David Eisenstat

Reputation: 65458

I implemented a divide by two algorithm in Java and verified it for all 32-bit inputs. I tried to follow your pseudocode, but there were three places where I diverged. First, the infinity/NaN exponent is 128. Second, in case 4 (normal -> normal), there's no need to operate on the fraction. Third, you didn't describe how round half to even works when you do operate on the fraction. LGTM otherwise.

public final class FloatDivision {
  public static float divideFloatByTwo(float value) {
    int bits = Float.floatToIntBits(value);
    int sign = bits >>> 31;
    int biased_exponent = (bits >>> 23) & 0xff;
    int exponent = biased_exponent - 127;
    int fraction = bits & 0x7fffff;
    if (exponent == 128) {
      // value is NaN or infinity
    } else if (exponent == -126) {
      // value is normal, but result is subnormal
      biased_exponent = 0;
      fraction = divideNonNegativeIntByTwo(0x800000 | fraction);
    } else if (exponent == -127) {
      // value is subnormal or zero
      fraction = divideNonNegativeIntByTwo(fraction);
    } else {
      // value and result are normal
      biased_exponent--;
    }
    return Float.intBitsToFloat((sign << 31) | (biased_exponent << 23) | fraction);
  }

  private static int divideNonNegativeIntByTwo(int value) {
    // round half to even
    return (value >>> 1) + ((value >>> 1) & value & 1);
  }

  public static void main(String[] args) {
    int bits = Integer.MIN_VALUE;
    do {
      if (bits % 0x800000 == 0) {
        System.out.println(bits);
      }
      float value = Float.intBitsToFloat(bits);
      if (Float.floatToIntBits(divideFloatByTwo(value)) != Float.floatToIntBits(value / 2)) {
        System.err.println(bits);
        break;
      }
    } while (++bits != Integer.MIN_VALUE);
  }
}

Upvotes: 2

Related Questions