CoffeDeveloper
CoffeDeveloper

Reputation: 8315

Smart bit encoding of floating point values (float, double)

Assume I have an array of floating point values and it is currently my bottleneck in disk loading and package size limit. How do I encode those values in order to reduce data size, given the following 3 inputs:

Note: If the float value is too small it will be just converted to a 0 (because whole mantissa is removed due to the usage of an absolute error, for too big numbers mantissa is just left untouched).

The order in which floating points values appear is important (it is NOT just a points cloud). So whichever the encoding is, after decoding the order should be preserved.

Right now I was not able to successfully save as many bits as possible:

I could save from 3 to 5 bits in the exponent field using the Min/Max values, but I was not able to exploit the absolute error, nor to exploit the extra bits in Min/Max values. Also I can cut from 1 to 3 mantissa bits if the Max value is not too big, but here again I'm not optimally using the 3 inputs.

I'm not looking for a compression, just for a better encoding (the package is already 7-zipped). Also note I'm referring to regular float and double values, the one you can find in C# and C++.

Importante notice: When exponent is big enough in IEEE floating point numbers the absolute error becomes bigger than what I use as input (first point in bullet list). This is not a problem since anyway that data started and ended as floating point. What is important instead is dropping useless bits in mantissa.

I Already tried fixed point numbers, they work, but only up to a certain degree.(see comment to tera's answer).

Upvotes: 1

Views: 1232

Answers (3)

Sneftel
Sneftel

Reputation: 41474

If you're specifically storing vertex position data for a large mesh, there are some other dirty tricks you can employ. Which tricks work best really depends on the types of vertex clouds you'll be encountering.

One simple trick I've seen for level data (doctors hate him!) is to partition vertices by their sign and exponent. For instance, you'd have a bin specifically for +x, -y, +z vertices with exponents of 138, 140, and 131 respectively. Then you store the signs and exponents for the run (27 bits), the number of vertices in the run, and the mantissa triple for each (69 bits). The compression rate here isn't amazing, but it's an easy method to implement.

Similarly, you can store runs of nearby vertices, and use Golomb-Rice coding to specify the deltas between each pair. This is more complicated, and requires careful tuning and clustering, but can potentially produce extremely good compression ratios.

Upvotes: 1

Sneftel
Sneftel

Reputation: 41474

If you care about absolute error, as opposed to relative error, you should not be using floating point numbers. Floating point is terrible at bounding absolute error; at high magnitudes it fails to achieve the needed bound, and at low magnitude it wastes its bits on precision you don't care about.

You say that in fixed point "big numbers can't be represented without sacrificing lower bits", but that has nothing to do with fixed versus floating point. If you have n bits, and require a maximum absolute error of ε, you will never be able to represent a magnitude range greater than ε*2^(n+1). But if you use floating point numbers, that n will be smaller, because you're spending bits on exponent instead of mantissa. That can never expand your magnitude range without violating your error bounds.

If you have an absolute range, and an absolute error bound, and are concerned with representation as opposed to computation, fixed-point is objectively the best you can do. If you think floating point has advantages over fixed point in this particular situiaton.

Oh, and note that all of that applies even if you've already lost precision from computation in an intermediate floating point storage. If you've violated your error bounds purely from that, then obviously it's the computation that needs to be fixed. Otherwise, the values you get out are better stored as fixed point than floating point.

Upvotes: -1

tera
tera

Reputation: 7255

Looks to me as if you are looking for a fixed-point representation. Subtract the minimum value, divide by two times the absolute error limit, and round to nearest integer.

Upvotes: 3

Related Questions