Reputation: 842
Some time ago, I came across a pair of functions in some CAD code to encode a set of coordinates (a pair of float
s) as a single float
(to use as a hash key), and then to unpack that single float back into the original pair.
The forward and backward functions only used standard mathematical operations -- no magic fiddling with bit-level representations of floats, no extracting and interleaving individual digits or anything like that. Obviously the reversal is not perfect in practice, because you lose considerable precision going from two floats to one, but according to the Wikipedia page for the function it should have been exactly invertible given infinite precision arithmetic.
Unfortunately, I don't work on that code anymore, and I've forgotten the name of the function so I can't look it up on Wikipedia again. Anybody know of a named mathematical functions that meets that description?
Upvotes: 3
Views: 889
Reputation: 26195
In a comment, OP clarified that the desired function should map two IEEE-754 binary64
operands into a single such operand. One way to accomplish this is to map each binary64
(double-precision) number into a positive integer in [0, 226-2], and then use a well-known pairing function to map two such integers into a single positive integer the interval [0,252), which is exactly representable in a binary64
which has 52 stored significand ("mantissa") bits.
As the binary64
operands are unrestricted in range per a comment by OP, all binades should be representable with equal relative accuracy, and we need to handle zeroes, infinities, and NaNs as well. For this reason I chose log2()
to compress the data. Zeros are treated as the smallest binary64
subnormal, 0x1.0p-1074
, which has the consequence that both 0x1.0p-1074
and zero will decompress into zero. The result from log2
falls into the range [-1074, 1024). Since we need to store the sign of the operand, we bias the logarithm value by 1074, giving a result in [0, 2098), then scale that to almost [0, 225), round to the nearest integer, and finally affix the sign of the original binary64
operand. The motivation for almost utilizing the complete range is to leave a little bit of room at the top of the range for special encodings for infinity and NaN (so a single canonical NaN encoding is used).
Since pairing functions known from the literature operate on natural numbers only, we need a mapping from whole numbers to natural numbers. This is easily accomplished by mapping negative whole numbers to odd natural numbers, while positive whole numbers are mapped to even natural numbers. Thus our operands are mapped from (-225, +225) to [0, 226-2]. The pairing function then combines two integers in [0, 226-2] into a single integer in [0, 252).
Different pairing functions known from the literature differ in their scrambling behavior, which may impact the hashing functionality mentioned in the question. They may also differ in their performance. Therefore I am offering a selection of four different pairing functions for the pair()
/ unpair()
implementations in the code below. Please see the comments in the code for corresponding literature references.
Unpacking of the packed operand involves applying the inverse of each packing step in reverse order. The unpairing function gives us two natural integers. These are mapped to two whole numbers, which are mapped to two logarithm values, which are then exponentiated with exp2()
to recover the original numbers, with a bit of added work to get special values and the sign correct.
While logarithms are represented with a relative accuracy on the order of 10-8, the expected maximum relative error in final results is on the order of 10-5 due to the well-known error magnification property of exponentiation. Maximum relative error observed for a pack()
/ unpack()
round-trip in extensive testing was 2.167e-5.
Below is my ISO C99 implementation of the algorithm together with a portion of my test framework. This should be portable to other programming languages with a modicum of effort.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#include <float.h>
#define SZUDZIK_ELEGANT_PAIRING (1)
#define ROZSA_PETER_PAIRING (2)
#define ROSENBERG_STRONG_PAIRING (3)
#define CHRISTOPH_MICHEL_PAIRING (4)
#define PAIRING_FUNCTION (ROZSA_PETER_PAIRING)
/*
https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
From: geo <[email protected]>
Newsgroups: sci.math,comp.lang.c,comp.lang.fortran
Subject: 64-bit KISS RNGs
Date: Sat, 28 Feb 2009 04:30:48 -0800 (PST)
This 64-bit KISS RNG has three components, each nearly
good enough to serve alone. The components are:
Multiply-With-Carry (MWC), period (2^121+2^63-1)
Xorshift (XSH), period 2^64-1
Congruential (CNG), period 2^64
*/
static uint64_t kiss64_x = 1234567890987654321ULL;
static uint64_t kiss64_c = 123456123456123456ULL;
static uint64_t kiss64_y = 362436362436362436ULL;
static uint64_t kiss64_z = 1066149217761810ULL;
static uint64_t kiss64_t;
#define MWC64 (kiss64_t = (kiss64_x << 58) + kiss64_c, \
kiss64_c = (kiss64_x >> 6), kiss64_x += kiss64_t, \
kiss64_c += (kiss64_x < kiss64_t), kiss64_x)
#define XSH64 (kiss64_y ^= (kiss64_y << 13), kiss64_y ^= (kiss64_y >> 17), \
kiss64_y ^= (kiss64_y << 43))
#define CNG64 (kiss64_z = 6906969069ULL * kiss64_z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)
double uint64_as_double (uint64_t a)
{
double r;
memcpy (&r, &a, sizeof r);
return r;
}
#define LOG2_BIAS (1074.0)
#define CLAMP_LOW (exp2(-LOG2_BIAS))
#define SCALE (15993.5193125)
#define NAN_ENCODING (33554430)
#define INF_ENCODING (33554420)
/* check whether argument is an odd integer */
int is_odd_int (double a)
{
return (-2.0 * floor (0.5 * a) + a) == 1.0;
}
/* compress double-precision number into an integer in (-2**25, +2**25) */
double compress (double x)
{
double t;
t = fabs (x);
t = (t < CLAMP_LOW) ? CLAMP_LOW : t;
t = rint ((log2 (t) + LOG2_BIAS) * SCALE);
if (isnan (x)) t = NAN_ENCODING;
if (isinf (x)) t = INF_ENCODING;
return copysign (t, x);
}
/* expand integer in (-2**25, +2**25) into double-precision number */
double decompress (double x)
{
double s, t;
s = fabs (x);
t = s / SCALE;
if (s == (NAN_ENCODING)) t = NAN;
if (s == (INF_ENCODING)) t = INFINITY;
return copysign ((t == 0) ? 0 : exp2 (t - LOG2_BIAS), x);
}
/* map whole numbers to natural numbers. Here: (-2^25, +2^25) to [0, 2^26-2] */
double map_Z_to_N (double x)
{
return (x < 0) ? (-2 * x - 1) : (2 * x);
}
/* Map natural numbers to whole numbers. Here: [0, 2^26-2] to (-2^25, +2^25) */
double map_N_to_Z (double x)
{
return is_odd_int (x) ? (-0.5 * (x + 1)) : (0.5 * x);
}
#if PAIRING_FUNCTION == SZUDZIK_ELEGANT_PAIRING
/* Matthew Szudzik, "An elegant pairing function." In Wolfram Research (ed.)
Special NKS 2006 Wolfram Science Conference, pp. 1-12.
Here: map two natural numbers in [0, 2^26-2] to natural number in [0, 2^52),
and vice versa
*/
double pair (double x, double y)
{
return (x != fmax (x, y)) ? (y * y + x) : (x * x + x + y);
}
void unpair (double z, double *x, double *y)
{
double sqrt_z = trunc (sqrt (z));
double sqrt_z_diff = z - sqrt_z * sqrt_z;
*x = (sqrt_z_diff < sqrt_z) ? sqrt_z_diff : sqrt_z;
*y = (sqrt_z_diff < sqrt_z) ? sqrt_z : (sqrt_z_diff - sqrt_z);
}
#elif PAIRING_FUNCTION == ROZSA_PETER_PAIRING
/*
Rozsa Peter, "Rekursive Funktionen" (1951), p. 44. Via David R. Hagen's blog,
https://drhagen.com/blog/superior-pairing-function/
Here: map two natural numbers in [0, 2^26-2] to natural number in [0, 2^52),
and vice versa
*/
double pair (double x, double y)
{
double mx = fmax (x, y);
double mn = fmin (x, y);
double sel = (mx == x) ? 0 : 1;
return mx * mx + mn * 2 + sel;
}
void unpair (double z, double *x, double *y)
{
double sqrt_z = trunc (sqrt (z));
double sqrt_z_diff = z - sqrt_z * sqrt_z;
double half_diff = trunc (sqrt_z_diff * 0.5);
*x = is_odd_int (sqrt_z_diff) ? half_diff : sqrt_z;
*y = is_odd_int (sqrt_z_diff) ? sqrt_z : half_diff;
}
#elif PAIRING_FUNCTION == ROSENBERG_STRONG_PAIRING
/*
A. L. Rosenberg and H. R. Strong, "Addressing arrays by shells",
IBM Technical Disclosure Bulletin, Vol. 14, No. 10, March 1972,
pp. 3026-3028.
Arnold L. Rosenberg, "Allocating storage for extendible arrays,"
Journal of the ACM, Vol. 21, No. 4, October 1974, pp. 652-670.
Corrigendum, Journal of the ACM, Vol. 22, No. 2, April 1975, p. 308.
Matthew P. Szudzik, "The Rosenberg-Strong Pairing Function", 2019
https://arxiv.org/abs/1706.04129
Here: map two natural numbers in [0, 2^26-2] to natural number in [0, 2^52),
and vice versa
*/
double pair (double x, double y)
{
double mx = fmax (x, y);
return mx * mx + mx + x - y;
}
void unpair (double z, double *x, double *y)
{
double sqrt_z = trunc (sqrt (z));
double sqrt_z_diff = z - sqrt_z * sqrt_z;
*x = (sqrt_z_diff < sqrt_z) ? sqrt_z_diff : sqrt_z;
*y = (sqrt_z_diff < sqrt_z) ? sqrt_z : (2 * sqrt_z - sqrt_z_diff);
}
#elif PAIRING_FUNCTION == CHRISTOPH_MICHEL_PAIRING
/*
Christoph Michel, "Enumerating a Grid in Spiral Order", September 7, 2016,
https://cmichel.io/enumerating-grid-in-spiral-order. Via German Wikipedia,
https://de.wikipedia.org/wiki/Cantorsche_Paarungsfunktion
Here: map two natural numbers in [0, 2^26-2] to natural number in [0, 2^52),
and vice versa
*/
double pair (double x, double y)
{
double mx = fmax (x, y);
return mx * mx + mx + (is_odd_int (mx) ? (x - y) : (y - x));
}
void unpair (double z, double *x, double *y)
{
double sqrt_z = trunc (sqrt (z));
double sqrt_z_diff = z - sqrt_z * (sqrt_z + 1);
double min_clamp = fmin (sqrt_z_diff, 0);
double max_clamp = fmax (sqrt_z_diff, 0);
*x = is_odd_int (sqrt_z) ? (sqrt_z + min_clamp) : (sqrt_z - max_clamp);
*y = is_odd_int (sqrt_z) ? (sqrt_z - max_clamp) : (sqrt_z + min_clamp);
}
#else
#error unknown PAIRING_FUNCTION
#endif
/* Lossy pairing function for double precision numbers. The maximum round-trip
relative error is about 2.167e-5
*/
double pack (double a, double b)
{
double c, p, q, s, t;
p = compress (a);
q = compress (b);
s = map_Z_to_N (p);
t = map_Z_to_N (q);
c = pair (s, t);
return c;
}
/* Unpairing function for double precision numbers. The maximum round-trip
relative error is about 2.167e-5 */
void unpack (double c, double *a, double *b)
{
double s, t, u, v;
unpair (c, &s, &t);
u = map_N_to_Z (s);
v = map_N_to_Z (t);
*a = decompress (u);
*b = decompress (v);
}
int main (void)
{
double a, b, c, ua, ub, relerr_a, relerr_b;
double max_relerr_a = 0, max_relerr_b = 0;
#if PAIRING_FUNCTION == SZUDZIK_ELEGANT_PAIRING
printf ("Testing with Szudzik's elegant pairing function\n");
#elif PAIRING_FUNCTION == ROZSA_PETER_PAIRING
printf ("Testing with Rozsa Peter's pairing function\n");
#elif PAIRING_FUNCTION == ROSENBERG_STRONG_PAIRING
printf ("Testing with Rosenberg-Strong pairing function\n");
#elif PAIRING_FUNCTION == CHRISTOPH_MICHEL_PAIRING
printf ("Testing with C. Michel's spiral pairing function\n");
#else
#error unkown PAIRING_FUNCTION
#endif
do {
a = uint64_as_double (KISS64);
b = uint64_as_double (KISS64);
c = pack (a, b);
unpack (c, &ua, &ub);
if (!isnan(ua) && !isinf(ua) && (ua != 0)) {
relerr_a = fabs ((ua - a) / a);
if (relerr_a > max_relerr_a) {
printf ("relerr_a= %15.8e a=% 23.16e ua=% 23.16e\n",
relerr_a, a, ua);
max_relerr_a = relerr_a;
}
}
if (!isnan(ub) && !isinf(ub) && (ub != 0)) {
relerr_b = fabs ((ub - b) / b);
if (relerr_b > max_relerr_b) {
printf ("relerr_b= %15.8e b=% 23.16e ub=% 23.16e\n",
relerr_b, b, ub);
max_relerr_b = relerr_b;
}
}
} while (1);
return EXIT_SUCCESS;
}
Upvotes: 2
Reputation: 41962
I don't know the name of the function, but you can normalize the 2 values x and y to the range [0, 1] using some methods like
X = arctan(x)/π + 0.5
Y = arctan(y)/π + 0.5
At this point X = 0.a1a2a3... and Y = 0.b1b2b3... Now just interleave the digits we can get a single float with value 0.a1b1a2b2a3b3...
At the receiving site just slice the bits and get back x and y from X and Y
x = tan((X - 0.5)π)
y = tan((Y - 0.5)π)
This works in decimal but also works in binary and of course it'll be easier to manipulate the binary digits directly. But probably you'll need to normalize the values to [0, ½] or [½, 1] to make the exponents the same. You can also avoid the use of bit manipulation by utilizing the fact that the significand part is always 24 bits long and we can just store x and y in the high and low parts of the significand. The result paired value is
r = ⌊X×212⌋/212 + Y/212
⌊x⌋ is the floor symbol. Now that's a pure math solution!
If you know the magnitudes of the values are always close to each other then you can improve the process by aligning the values' radix points to normalize and take the high 12 significant bits of the significand to merge together, no need to use atan
In case the range of the values is limited then you can normalize by this formula to avoid the loss of precision due to atan
X = (x - min)/(max - min)
Y = (y - min)/(max - min)
But in this case there's a way to combine the values just with pure mathematical functions. Suppose the values are in the range [0, max] the the value is r = x*max + y
. To reverse the operation:
x = ⌊r/max⌋;
y = r mod max
If min is not zero then just shift the range accordingly
Upvotes: 2