shawn_halayka
shawn_halayka

Reputation: 106

In C++, what is the standard behaviour of static_cast<float>(some_double_variable)?

I can't ask new questions now, so I'm redoing this question:

How would I go about manually constructing a float from the value of a double, without casting?

Assume that the double variable's range is 0 to 1 inclusive.

I tried this, but it doesn't quite work like a cast would:

double truncate_normalized_double(double d)
{
    if (d <= 0.0)
        return 0.0;
    else if (d >= 1.0)
        return 1.0;

    static const double epsilon = pow(2.0, -23.0);
    const double remainder = fmod(d, epsilon);

    d += remainder;

    return d;
}

Any insight on how the cvtsd2ss instruction works under the hood?

Upvotes: 1

Views: 224

Answers (2)

shawn_halayka
shawn_halayka

Reputation: 106

This works:

double truncate_normalized_double(double d)
{
    if (d <= 0.0)
        return 0.0;
    else if (d >= 1.0)
        return 1.0;

    static const long long signed int mantissa_bits = 23;

    static const double epsilon = pow(2, -mantissa_bits);
    const double remainder = fmod(d, epsilon);

    d = nexttowardf(d, d - remainder);

    return d;
}

The point is to not explicitly use a cast, and it does the job. Still, is there a better version? Thank you for all of your input, people.

Upvotes: 0

Tony Delroy
Tony Delroy

Reputation: 106236

The relevant behaviours are described in the Standard's [conv.double] section:

... A prvalue of standard floating-point type can be converted to a prvalue of another standard floating-point type.

2 If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.

Both float and double (and also long double, FWIW) are "standard floating-point types", so the above applies. When doing a narrowing conversion from double to float, many double values will not "be exactly represented in the destination type", so "if the source value is between two adjacent destination values [implementation picks one]" kicks in (and there's a loss of precision), leaving the case when the double's value is not between two representable float values as undefined behaviour. That basically means the conversion to float has undefined behaviour for doubles with too-large absolute value: below std::numeric_limits<float>::lowest() or above std::numeric_limits<float>::max().

Of course, implementations may provide implementation-defined behaviours for situations that the Standard leaves undefined (just as POSIX provides NaN results from floating-point divisions by 0, whereas the Standard leaves that undefined).

Upvotes: 4

Related Questions