How to encode a number into custom range

Question

I'm in the need to write a dynamic compression for float values. The dynamic part is, that the end user will provide bounds and precision and the rest has to be handled.

The first part is to figure out how many bits are needed. The formula I use is the following:

log2(bound x precision) + 1

So as an example

bound: 256 Units | precision: 512 per Unit

So the bits needed are 17 + 1 for the sign bit, so in total I need 18 bits.

My question is now how to convert a number then into my compressed format?

As an example the number 212.128456

There is for sure a precision lost which is fine, but I can't wrap my head around the maths needed to bring this value into range.

The biggest number you can prdouce with 18 bits is (2^n) so 2^17 is 131072 so we have a value range of [-131072, 131071] when using 18 bits.

So if I shift 212.128456 by 1000 I receive 212128.456. If I get only the abs value I have 212128. This is bigger then 131071. So how can I fit this in?

What is the correct why to fit this number in?

Rory Daulton · Accepted Answer

Short answer: You shifted your value by 1000, but the shift should have been by only 512.

Longer answer: Let's use your example of a bound of 256--I'll assume that means the numbers are limited to the interval [-256, 255]. The precision is 512--I'll assume that means that you want the absolute precision to be 1/512, which itself means the number we store will be at most 1/512 from the given number.

As you say, we need 18 bits to do this. How do we store the number 212.128456?

First we split that number into its sign and its absolute value, which are + and 212.128456 respectively. Since the precision is 512, we multiply that absolute value by 512 and get 108609.769472. We now round that to the nearest integer and get 108610. Note that 2**17 is 131072 so our encoded number so far fits into 17 bits.

To get the sign bit in there, let's use two's-complement arithmetic, so we multiply the sign value by 2**17. Nonnegative numbers get a sign value of 0, while negatives get a sign value of 1. In our case 0 times 131072 is 0, and we add that to our encoded number so far and get our final value of 108610. That fits into 18 bits.

To decode that number and see what we actually stored, we do this backwards. We know our encoded number is 108610 in 18 bits. First we get the sign by comparing our encoded number with 2**17 namely 131072. Our encoded number is less so the sign of our decoded number will be nonnegative and we keep working with the value 108610. If our encoded number were greater than or equal to 2**17 we would have said the final sign will be negative and subtracted 2**17 from the encoded value to get our next working value.

Since our precision is 512 we now divide our number 108610 by 512 and get 212.12890625. Finally, we prepend the sign we found earlier and get the decoded value 212.12890625.

How close is that to the original value 212.128456? Subtracting those two values yields 0.00045025. (My Python actually returned a very slightly larger value than that, due to floating point imprecision, but that does not change this discussion.) That is smaller than 1/2200 so we easily beat the bound of 1/512.

Final Note: Since in the coding process I divided by 512 then rounded to the nearest integer, the precision in this process would actually be 1024 if there are no floating-point imprecisions in the calculations. That means you might be able to get away with using one less bit to store the encoded value, and you would accomplish this by multiplying by 256 rather than 512. However, you cannot be sure that there will be no floating-point imprecisions in the calculations, so if that 512 bound is a hard one you should keep using the extra bit.

How to encode a number into custom range

Answers (1)

Related Questions