Ant
Ant

Reputation: 55

float round-off in c++

int a,b;
a = 2147483647;
b = 1000;
printf("%.2f",(float)a/(float)b);

This should print 2147483.65 but instead it prints out 2147483.75 please help

Upvotes: 1

Views: 257

Answers (3)

BusyProgrammer
BusyProgrammer

Reputation: 2791

The problem here is that you have run out of memory space for storing the floating point value. I'll explain the issue by taking a broader perspective of the problem.

2147483647 is the largest number that can be stored in an 32-bit signed int variable. If we see it in binary format, it is: 1111111111111111111111111111111. Now, even though a signed float is also 4 bytes (32-bit signed), a floating point variable also has to store values that are less than 1 and greater than 0. Therefore, you face a problem in the following line:

printf("%.2f",(float)a/(float)b);

Because when you cast a to a float, you inadvertently do (float)(2147483647). Now since this number alone takes 31 bits (the 32nd bit is for the sign of the value; here it is zero, since the number is positive), and a float allocates some space for the fractional part of the value, you have less than 31 bits for the same value. This causes float overflow, and leads to the problem you are having.

So, to fix this, you should cast a to double. It is unrequired to cast bas a double since it fits within a float. Also, casting one variable in a mathematical operation is enough to have the compiler treat the result as the type which was casted to one of the variables.

So simply by doing:

printf("%.2f",(double)a/b);

You cast the end result to a type double, and it gives you the desired result 2147483.65.

By the way, just as an aside, a double has 8 bytes of memory, twice of float's size. Therefore, it is more practical to use double in operations where precision is necessary.

Upvotes: 0

Avinash
Avinash

Reputation: 2191

replace float with double

#include <iostream>
using namespace std;

int main() {
    // your code goes here
    long a,b;
    a = 2147483647;
    b = 1000;
    printf("%.2f",(double)a/(double)b);  // No need to cast both a/(double)b is enough
    return 0;
}

o/p : 2147483.65 Check here

This is because of float data type has to deal fractional part as well.

Upvotes: -2

VHS
VHS

Reputation: 10174

Cast a and b to double instead of float.

Reason:

The value of a in your program is 2147483647 which is the highest value that can be stored by a 4 byte long (32 bit) data type such as int. However float even though being a 32 bit data type truncates it because it has to account for fractional digits too. If you use double instead, it is 8 bytes long and hence can easily accommodate 2147483647 or 2147483.65.

Upvotes: 3

Related Questions