Reputation: 115
I am trying to return the result of casting a single precision float to an int just from its bit string. I correctly grab the sign bit, exponent and mantissa from the bit string but I am unsure how to ensure a float is in the range (0x80000000<float<0x7fffffff) solely from its bit string.
Additionally, I am confused when I should round the result to 0? It appears that casting to an int does round to 0, I just don't know when or why this occurs?
Here is my code so far:
int convertFloatToAnInt(unsigned f) {
//trying to return (inf)f
int signBit = (f >> 31) & 1;
int exponent = (f >> 23) & 0xFF;
int mantissa = f & 0x7FFFFF;
exponent = exponent - 127;
int truncatedFloat = 0;
//trucate based on the exponent with decimal at 23rd bit
truncatedFloat = (mantissa>>(23-exponent));
//add the implicit 1 back to the mantissa
truncatedFloat |= (1<<exponent);
//change the sign if needed
if (signBit) {
truncatedFloat = ~(truncatedFloat)+1;
}
return truncatedFloat;//(int)f
}
Upvotes: 2
Views: 125
Reputation: 386371
Assuming float
uses that format, you can safely perform the type conversion using a union.
uint32_t u = ...;
float f = ( ( union { uint32_t u; float f; } ){ .u = u } ).f;
memcpy
could also be used.
Upvotes: 1
Reputation: 223494
I am unsure how to ensure a float is in the range (0x80000000<float<0x7fffffff) solely from its bit string.
You can use 0 <= f && f < 0x4F000000u || 0x80000000u <= f && f <= 0xCF000000u
.
The encodings of IEEE-754 binary32 floating-point numbers are in ascending order for non-negative numbers and descending order for negative numbers. I assume you are using binary32 for float
and you have the bit strings in the proper order.
The bit patterns for +0 and +231 are 0000000016 and 4F00000016. So a float
is in [+0, +231) iff its bit string is in [0000000016, 4F00000016).
The patterns for −0 and −231 are 8000000016 and CF00000016. So a float
is in [−231, −0] iff its bit string is in [8000000016, CF00000016].
It appears that casting to an int does round to 0, I just don't know when or why this occurs?
Conversion of a floating-point type to an integer type is specified by C 2018 6.3.1.4 1, and it says the value is truncated toward zero.
Upvotes: 0
Reputation: 7719
I will assume the IEEE 754 single-precision representation of float
and 32-bit width for int
and unsigned
as indicated in the question.
The int
has range [-2^31; 2^31-1].
The significand in the float
is essentially a 24 bit value multiplied by 2^-23 (i.e. shifted 23 places to the right). This will fit within 31 bits if it is shifted at most 30 places to the left. Hence, if the exponent is at most 30, then the value will fit into an int
. However, there is one case where the exponent is 31 which also fits: if the value is -2^31, i.e. the sign bit is set and the exponent is 31 and the mantissa is 0.
When doing the conversion, these checks must be made before doing the bit-shifting to avoid undefined behaviour. Also, it must be checked if the mantissa should be shifted left or right since shifting a negative amount is undefined behaviour.
Thus an implementation could be:
int toInt(unsigned f)
{
int signBit = (f >> 31) & 1;
int exponent = (f >> 23) & 0xFF;
int mantissa = f & 0x7FFFFF;
exponent = exponent - 127;
int result = 0;
if(exponent >= 0)
{
if(exponent <= 30)
{ // The result is within range
int mantissa1 = mantissa | (1u << 23);
if(exponent <= 23) // Need to shift right
result = mantissa1 >> (23-exponent);
else // Need to shift left
result = mantissa1 << (exponent-23);
if(signBit) result = -result;
}
else
{ // Probably out of range, but check for -2^31
if(exponent == 31 && signBit && mantissa == 0)
result = -(1u<<31);
else
printf("Error: out of range");
}
}
// else the absolute value is less than 1, so the result is 0
return result;
}
Note: calling this function requires passing the bit pattern of a float as an unsigned. This can be done as follows:
#include <math.h>
#include <string.h>
int main()
{
float x = -pow(2.0, 31);
unsigned xBits;
memcpy(&xBits, &x, 4);
printf("%d, %d", (int)x, toInt(xBits));
return 0;
}
It is important to notice that (int)x
will be undefined behaviour if x
does not fit in an int
. The function toInt()
will not have undefined behaviour for any value of x
. However, of course, toInt()
relies heavily on implementation-defined behavior such as the representation of float
.
Upvotes: 2