Ben Zotto
Ben Zotto

Reputation: 71008

Bitwise conversion of int64 to IEEE double?

I'm trying to find or figure out the algorithm for converting a signed 64-bit int (twos-complement, natch) to closest value IEEE double (64-bit), staying within bitwise operations.What I'm looking for is for the generic "C-like" pseudocode; I'm implementing a toy JVM on a platform that is not C and doesn't have a native int64 types, so I'm operating on 8 byte arrays (details of that are mercifully outside this scope) and that's the domain the data needs to stay in.

So: input is a big-endian string of 64 bits, signed twos-complement. Output is a big-endian string of 64 bits in IEEE double format that represents as near the original int64 value as possible. In between is some set of masks, shifts, etc! Algorithm absolutely does not need to be especially clever or optimized. I just want to be able to get to the result and ideally understand what the process is.

Having trouble tracking this down because I suspect it's an unusual need. This answer addresses a parallel question (I think) in x86 SSE, but I don't speak SSE and my attempts and translation leave me more confused than enlightened.

Would love someone to either point in the right direction for a recipe or ideally explain the bitwise math behind so I actually understand it. Thanks!

Upvotes: 0

Views: 274

Answers (1)

Carl Norum
Carl Norum

Reputation: 224964

Here's a simple (and wrong in several ways) implementation, including a test harness.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

double do_convert(int64_t input)
{
    uint64_t sign = (input < 0);
    uint64_t magnitude;

    // breaks on INT64_MIN
    if (sign)
        magnitude = -input;
    else
        magnitude = input;    

    // use your favourite algorithm here instead of the builtin
    int leading_zeros = __builtin_clzl(magnitude);
    uint64_t exponent = (63 - leading_zeros) + 1023;
    uint64_t significand = (magnitude << (leading_zeros + 1)) >> 12;

    uint64_t fake_double = sign << 63
                         | exponent << 52
                         | significand;

    double d;
    memcpy(&d, &fake_double, sizeof d);

    return d;
}

int main(int argc, char** argv)
{
    for (int i = 1; i < argc; i++)
    {
        long l = strtol(argv[i], NULL, 0);
        double d = do_convert(l);
        printf("%ld %f\n", l, d);
    }

    return 0;
}

The breakages here are many - the basic idea is to first extract the sign bit, then treat the number as positive the rest of the way, which won't work if the input is INT64_MIN. It also doesn't handle input 0 correctly because it doesn't correctly deal with the exponent in that case. These extensions are left as an exercise for the reader. ;-)

Anyway - the algorithm just figures out the exponent by calculating log2 of the input number and offsetting by 1023 (because floating point) and then getting the significand by shifting the number up far enough to drop off the most significant bit, then shifting back down into the right field position.

After all that, the assembly of the final double is pretty straightforward.

Edit:

Speaking of exercises for the reader - I also implemented this program using _builtin_clzl(). You can expand that part as necessary.

Upvotes: 1

Related Questions