Reputation: 46482
There are special formats (base-128) designed for transmitting integers used in protobufs and elsewhere. They're advantageous when most the integers are small (they need a single byte for smallest numbers and may waste one byte for others).
I wonder if there's something similar for floating point numbers under the assumption that most of them are actually small integers?
To address the answer by Alice: I was thinking about something like
void putCompressedDouble(double x) {
int n = (int) x;
boolean fits = (n == x);
putBoolean(fits);
if (fits) {
putCompressedInt(n);
} else {
putUncompressedLong(Double.doubleToLongBits(x));
}
}
This works (except for the negative zero, which I really don't care about), but it's wasteful in case of fits == true
.
Upvotes: 1
Views: 1020
Reputation: 46482
I really like Durandal's solution. Despite its simplicity, it performs pretty well, at least for float
s. For double
s with their exponent longer than one byte, some additional bit rearrangement might help. The following table gives the encoding length for numbers with up to D
digits, negative numbers are also considered. In each column the first number given the maximum bytes needed while the parenthesized number is the average.
D AS_INT REV_FLOAT REV_DOUBLE BEST
1: 1 (1.0) 2 (1.8) 3 (2.2) 1 (1.0)
2: 2 (1.4) 3 (2.4) 3 (2.8) 2 (1.7)
3: 2 (1.9) 3 (2.9) 4 (3.2) 2 (2.0)
4: 3 (2.2) 4 (3.3) 4 (3.8) 3 (2.6)
5: 3 (2.9) 4 (3.9) 5 (4.1) 3 (3.0)
6: 3 (3.0) 5 (4.2) 5 (4.8) 4 (3.5)
7: 4 (3.9) 5 (4.8) 6 (5.1) 4 (3.9)
8: 4 (4.0) 5 (4.9) 6 (5.8) 5 (4.3)
9: 5 (4.9) 5 (4.9) 6 (6.0) 5 (4.9)
Four different methods were tested:
int
. This is unusable but gives us a lower bound.float
s.double
s.Upvotes: 1
Reputation: 20069
It depends on the distribution of your numbers. Magnitude doesn't really matter that much, since its expressed through the exponent field of a float. Its usually the mantissa that contributes the most "weight" in terms of storage.
If your floats are mainly integers, you may gain something by converting to int (via Float.floatToIntBits()), and checking how many trailing zeros there are (for small int values there should be up to 23 trailing zeros). When using a simple scheme to encode small int's, you may implement encoding floats simply as:
int raw = Float.floatToIntBits(f);
raw = Integer.reverse(raw);
encodeAsInt(raw);
(Decoding is simply reversing the process). What this does is simply move the trailing zeros in the mantissa to the most significant bits of the int representation, which is friendly to encoding schemes devised for small integers.
Same can be applied to double<->long.
Upvotes: 1
Reputation: 3986
Probably not, and this is almost certainly not something you want.
As noted at this stack overflow post, floating point numbers are not stored in a platform independent way in protocol buffers; they are essentially bit for bit representations that are then cast using a union. This means float will take 4 bytes and double 8 bytes. This is almost certainly what you want.
Why? Floating points are not integers. The integers are a well formed group; each number is valid, every bit pattern represents a number, and they exactly represent the integer they are. Floating points cannot represent many important numbers exactly: most floats can't represent 0.1 exactly, for example. The problem of infinities, NAN's, etc etc, all make a compressed format a non-trivial task.
If you have small integers in a float, then convert them to small integers or some fixed point precision format. For example, if you know you only have....4 sigfigs, you can convert from floating point to a fixed point short, saving 2 bytes. Just make sure each end knows how to deal with this type, and you'll be golden.
But any operation that google could do to try and save space in this instance would be both reinventing the wheel and potentially dangerous. Which is probably why they try not to mess with floats at all.
Upvotes: 0