Reputation: 46482

Compact format for floating-point numbers

There are special formats (base-128) designed for transmitting integers used in protobufs and elsewhere. They're advantageous when most the integers are small (they need a single byte for smallest numbers and may waste one byte for others).

I wonder if there's something similar for floating point numbers under the assumption that most of them are actually small integers?

To address the answer by Alice: I was thinking about something like

void putCompressedDouble(double x) {
    int n = (int) x;
    boolean fits = (n == x);
    putBoolean(fits);
    if (fits) {
        putCompressedInt(n);
    } else {
        putUncompressedLong(Double.doubleToLongBits(x));
    }
}

This works (except for the negative zero, which I really don't care about), but it's wasteful in case of fits == true.

Upvotes: 1

Answers (3)

maaartinus

Reputation: 46482

I really like Durandal's solution. Despite its simplicity, it performs pretty well, at least for floats. For doubles with their exponent longer than one byte, some additional bit rearrangement might help. The following table gives the encoding length for numbers with up to D digits, negative numbers are also considered. In each column the first number given the maximum bytes needed while the parenthesized number is the average.

D   AS_INT    REV_FLOAT REV_DOUBLE BEST
1:  1 (1.0)   2 (1.8)   3 (2.2)    1 (1.0)
2:  2 (1.4)   3 (2.4)   3 (2.8)    2 (1.7)
3:  2 (1.9)   3 (2.9)   4 (3.2)    2 (2.0)
4:  3 (2.2)   4 (3.3)   4 (3.8)    3 (2.6)
5:  3 (2.9)   4 (3.9)   5 (4.1)    3 (3.0)
6:  3 (3.0)   5 (4.2)   5 (4.8)    4 (3.5)
7:  4 (3.9)   5 (4.8)   6 (5.1)    4 (3.9)
8:  4 (4.0)   5 (4.9)   6 (5.8)    5 (4.3)
9:  5 (4.9)   5 (4.9)   6 (6.0)    5 (4.9)

Four different methods were tested:

AS_INT: Simply convert the number to int. This is unusable but gives us a lower bound.
REV_FLOAT: The method by Durandal applied to floats.
REV_DOUBLE: The method by Durandal applied to doubles.
BEST: An improvement of my own method as described in the question. Rather complicated.

Upvotes: 1

Durandal

Reputation: 20069

It depends on the distribution of your numbers. Magnitude doesn't really matter that much, since its expressed through the exponent field of a float. Its usually the mantissa that contributes the most "weight" in terms of storage.

If your floats are mainly integers, you may gain something by converting to int (via Float.floatToIntBits()), and checking how many trailing zeros there are (for small int values there should be up to 23 trailing zeros). When using a simple scheme to encode small int's, you may implement encoding floats simply as:

int raw = Float.floatToIntBits(f);
raw = Integer.reverse(raw);
encodeAsInt(raw);

(Decoding is simply reversing the process). What this does is simply move the trailing zeros in the mantissa to the most significant bits of the int representation, which is friendly to encoding schemes devised for small integers.

Same can be applied to double<->long.

Upvotes: 1

Alice

Reputation: 3986

Probably not, and this is almost certainly not something you want.

As noted at this stack overflow post, floating point numbers are not stored in a platform independent way in protocol buffers; they are essentially bit for bit representations that are then cast using a union. This means float will take 4 bytes and double 8 bytes. This is almost certainly what you want.

Why? Floating points are not integers. The integers are a well formed group; each number is valid, every bit pattern represents a number, and they exactly represent the integer they are. Floating points cannot represent many important numbers exactly: most floats can't represent 0.1 exactly, for example. The problem of infinities, NAN's, etc etc, all make a compressed format a non-trivial task.

If you have small integers in a float, then convert them to small integers or some fixed point precision format. For example, if you know you only have....4 sigfigs, you can convert from floating point to a fixed point short, saving 2 bytes. Just make sure each end knows how to deal with this type, and you'll be golden.

But any operation that google could do to try and save space in this instance would be both reinventing the wheel and potentially dangerous. Which is probably why they try not to mess with floats at all.

Upvotes: 0

Compact format for floating-point numbers

Answers (3)

Related Questions