Reputation: 115

Bounds checking a float from only an unsigned bit string

I am trying to return the result of casting a single precision float to an int just from its bit string. I correctly grab the sign bit, exponent and mantissa from the bit string but I am unsure how to ensure a float is in the range (0x80000000<float<0x7fffffff) solely from its bit string.

Additionally, I am confused when I should round the result to 0? It appears that casting to an int does round to 0, I just don't know when or why this occurs?

Here is my code so far:

int convertFloatToAnInt(unsigned f) {
//trying to return (inf)f
  int signBit = (f >> 31) & 1;
  int exponent = (f >> 23) & 0xFF;
  int mantissa = f & 0x7FFFFF;
  exponent = exponent - 127;
 
  int truncatedFloat = 0;
  //trucate based on the exponent with decimal at 23rd bit
  truncatedFloat = (mantissa>>(23-exponent));
    
  //add the implicit 1 back to the mantissa
  truncatedFloat |= (1<<exponent);

  //change the sign if needed
  if (signBit) {
    truncatedFloat = ~(truncatedFloat)+1;
  }

  return truncatedFloat;//(int)f
}

Upvotes: 2

Answers (3)

ikegami

Reputation: 386371

Assuming float uses that format, you can safely perform the type conversion using a union.

uint32_t u = ...;

float f = ( ( union { uint32_t u; float f; } ){ .u = u } ).f;

memcpy could also be used.

Upvotes: 1

Eric Postpischil

Reputation: 223494

I am unsure how to ensure a float is in the range (0x80000000<float<0x7fffffff) solely from its bit string.

You can use 0 <= f && f < 0x4F000000u || 0x80000000u <= f && f <= 0xCF000000u.

The encodings of IEEE-754 binary32 floating-point numbers are in ascending order for non-negative numbers and descending order for negative numbers. I assume you are using binary32 for float and you have the bit strings in the proper order.

The bit patterns for +0 and +2³¹ are 00000000₁₆ and 4F000000₁₆. So a float is in [+0, +2³¹) iff its bit string is in [00000000₁₆, 4F000000₁₆).

The patterns for −0 and −2³¹ are 80000000₁₆ and CF000000₁₆. So a float is in [−2³¹, −0] iff its bit string is in [80000000₁₆, CF000000₁₆].

It appears that casting to an int does round to 0, I just don't know when or why this occurs?

Conversion of a floating-point type to an integer type is specified by C 2018 6.3.1.4 1, and it says the value is truncated toward zero.

Upvotes: 0

nielsen

Reputation: 7719

I will assume the IEEE 754 single-precision representation of float and 32-bit width for int and unsigned as indicated in the question.

The int has range [-2^31; 2^31-1].

The significand in the float is essentially a 24 bit value multiplied by 2^-23 (i.e. shifted 23 places to the right). This will fit within 31 bits if it is shifted at most 30 places to the left. Hence, if the exponent is at most 30, then the value will fit into an int. However, there is one case where the exponent is 31 which also fits: if the value is -2^31, i.e. the sign bit is set and the exponent is 31 and the mantissa is 0.

When doing the conversion, these checks must be made before doing the bit-shifting to avoid undefined behaviour. Also, it must be checked if the mantissa should be shifted left or right since shifting a negative amount is undefined behaviour.

Thus an implementation could be:

int toInt(unsigned f)
{
  int signBit = (f >> 31) & 1;
  int exponent = (f >> 23) & 0xFF;
  int mantissa = f & 0x7FFFFF;
  exponent = exponent - 127;

  int result = 0;
  if(exponent >= 0)
  {
    if(exponent <= 30)
    { // The result is within range
      int mantissa1 = mantissa | (1u << 23);
      if(exponent <= 23)   // Need to shift right
        result = mantissa1 >> (23-exponent);
      else                 // Need to shift left
        result = mantissa1 << (exponent-23);  
      if(signBit) result = -result;
    }
    else
    { // Probably out of range, but check for -2^31
      if(exponent == 31 && signBit && mantissa == 0)
        result = -(1u<<31);
      else
        printf("Error: out of range");
    }
  }
  // else the absolute value is less than 1, so the result is 0
  return result;
}

Note: calling this function requires passing the bit pattern of a float as an unsigned. This can be done as follows:

#include <math.h>
#include <string.h>

int main()
{
    float x = -pow(2.0, 31);
    unsigned xBits;
    memcpy(&xBits, &x, 4);
    printf("%d, %d", (int)x, toInt(xBits));

    return 0;
}

It is important to notice that (int)x will be undefined behaviour if x does not fit in an int. The function toInt() will not have undefined behaviour for any value of x. However, of course, toInt() relies heavily on implementation-defined behavior such as the representation of float.

Upvotes: 2

Bounds checking a float from only an unsigned bit string

Answers (3)

Related Questions