Reputation: 93

Describe list of numbers by unique identifier

I think it’s pretty trivial question. I have a system of particles, each has a coordinate. I need to describe every state of the system(i.e. positions of all particles at each time step) by a number.

Multiplying them is wrong:

x1=0 * y1=0 * z1=0 * x2=1 * y2=1 * z2=1 = 0

and, for example,

x1=1 * y1=1 * z1=1 * x2=0 * y2=0 * z2=0 = 0

gives the same result, although it’s different states.

The algorithm below, is a bit better but still wrong

1*(x1=1 * y1=1 * z1=1) + 2*(x2=0 * y2=0 * z2=0) + ...

How to convert a list of number to unique number?

Upvotes: 1

Answers (4)

Luis Colorado

Reputation: 12708

If the diferent values (discrete, bounded) of the variables that define the state, the most compact way to encode their state is using the following formula.

Let the cardinalities of each variable be n0, n1, n2, n3..., while the variable values are v0, v1, v2`... The formula is:

v0 + n0*v1 + n0*n1*v2 + n0*n1*n2 * v3 + .... + n0*n1*..*nm-1 * vn

if you compare it with the expresion for a number in base B with digits d0, d1...

d0 + B * d1 + B^2 * d2 + B^n-1 * bn-1

You will see that the only thing we have extended is that the base for each digit is different.

This is applied, for example, to compute in a compact form the number of seconds in a day from the midnight as considering the digits in these sequence of basis:

60, 60, 24, 365, 100
ss, mm, hh, jd, c

for seconds, minutes, hours, julian days, years, centuries, etc.

Upvotes: 0

JaMiT

Reputation: 17072

How to convert a list of number to unique number?

There are various mathematical tricks for this (e.g. leveraging prime factorization), but that does not solve your problem, since the "unique number" is potentially quite large. (The upper bound must be at least the number of possible states.) The number of bits required to store the "unique number" may very well be the same as the sum of the number of bits required to store each number in the list. In that case, even though you call it a single number, it takes the computer just as much work to compute and store it as it would take to store the raw list.

To put the situation in more familiar terms: pick up a pencil and choose between writing ten thousand 2-digit numbers or a single 20,000-digit number on a sheet of paper. Which option is easier?

Upvotes: 1

midor

Reputation: 5557

Assuming that your system has N elements with (x,y,z) coordinates where each of x,y,z is represented by a double(64-bit), your point probably looks similar to this:

struct point { double x, y, z;};

The simplest way to express that as a unique "number" is to represent your system as an array of points of size N and interpret its binary representation as number. An array is contiguous in memory so it is a number in itself. The easiest way to interpret it as a number is by interpreting its bytes as big integer, where the overall number is computed as

state[0] + 2^8 *state[1] + 2^16 * state[2] + ... 2^*((n - 1)*8) * state[n-1]

To compute this you would of course need a bigint library, and it is probably not useful for many purposes. If some numbers repeat often and there a not too many of them this can however already be helpful, since you can start to replace them with simple ids during the first run through the data by aliasing them to a smaller id. For any given run, the size of the id will only have the size of the number of different configurations, plus the overhead to store a translation table from the "number" to the smaller id. Hence this obviously only pays off if configurations repeat.

Why can't you use anything smaller? If you used any smaller value to represent the x,y, and z coordinates, two values would be represented by the same number. The type does not matter here: if your point's were bool, you see even more clearly that any smaller value would have to conflate 0 and 1.

A system with (2^64 * 2^64 * 2^64)^N possible states cannot be represented by any fewer bits while still distinguishing each state uniquely.

You can use hashing to make lookup of these states faster if you are not looking at a situation where most of your position in the state space occur. This is however clearly based on a low probability of collisions. The hash just maps your big number to a smaller one in a way that reduces collisions between elements. If you hash, e.g., the contents of the aforementioned point-array, you can look up a certain state, e.g., to increment a count of how many times you have seen it before or add the time-steps at which the state was observed to a list keyed by this hash in a hash map.

You can use lossless compression if your states have an uneven probability distribution to use less memory for the most frequent states. You still need the same worst-case memory, but in a simplified example you would assign 0 to your most probably value, 1 to the next most probably etc. Algorithms to look up here are huffmann coding and LZ/DEFLATE. They are typically readily available through libraries.

Upvotes: 1

Nitheesh George

Reputation: 1367

Just a suggestion, to generate a unique number for a 3D point, device a formula like below

assume 255 is your unique factor and considering your equation

x1=0 * y1=0 * z1=0 * x2=1 * y2=1 * z2=1 = 0

You could yield a unique number by,

 UniqueNumber = (255 * x1) + (512 * y1) + 1024 * z1) + (2048 * x2) + (4096* y2) + (8192 * z2)

Upvotes: 0

Describe list of numbers by unique identifier

Answers (4)

Related Questions