Serge
Serge

Reputation: 1601

Bijective analog of hashing

Integers from 0 to M are to be mapped to n-character codes, made up from a certain alphabet. The tricky part is that the codes should look pseudo-random, non-sequential. Like so:

0    BX07SU
1    TYN9RQ
2    QZ1697

I assume it's a common knowledge piece that I miss. How do they call this type of functions f(i) = s - which maps an integer to a pseudo-random character string, with no collisions within a certain range?

Reverse function would be great too: h(s) = i, able to 'decode' a valid string back into an integer, or determine that the supplied string is invalid.

Upvotes: 3

Views: 1462

Answers (3)

phs
phs

Reputation: 11051

For small M @user1494736 is right: just make a lookup table that happens to be bijective, and tailor it to your needs.

For large M storing the table becomes impractical; we need an algorithm to calculate the mapping. You mentioned we don't need strong cryptography; this frees us to choose something more light-weight than standard ciphers. That said, they are quite fast and I expect you could use one easily with a 3rd-party library. But let's suppose you prefer to avoid this for some reason.

One way to make a poor man's cipher is to xor a constant onto your plaintext (to change the Hamming weight), and then apply a bit permutation (to destroy locality.) Both of these steps are invertible. It sounds like you want to display your cipher letters as ASCII. So, we may want a final invertible transform (an addition) to ensure the cipher values are printable.

You mentioned M might be as large as a few million. So, let's take your plaintext letters (and so ciphertext letters) to be 32-bit values. Finding a random 32-bit value to xor is easy. How about the bit permutation?

A few million plaintext letters translates into each letter being around 22 or 23 bits wide; this leaves at least 8 (logical) upper bits cleared in every plaintext letter. The permutation can take advantage of this to help ensure the ciphertext bytes are within the ASCII printable range. By sending that upper plaintext byte to the upper two bits of each of the four cipher bytes, we ensure each cipher byte takes values 0 to 63. The final step could then add 48 to make the range 48 to 111, well within the ASCII printable range.

Playing off this observation, we could envision our plaintext letter as a 4x4 grid of bit pairs, and make our permutation a rotation of this grid:

Plain

A B C D   byte 3 (the high byte, expected to be all 0)
E F G H   byte 2
I J K L   byte 1
M N O P   byte 0

Cipher

D H L P   byte 3
C G K O   byte 2
B F J N   byte 1
A E I M   byte 0

Or put another way:

Byte       3    2    1    0
Plain   ABCD EFGH IJKL MNOP
Cipher  DHLP CGKO BFJN AEIM

Notice that a piece of each plaintext byte appears in every ciphertext byte: this is what will ensure the encoding "looks random".

Upvotes: 4

IamIC
IamIC

Reputation: 18249

The simplest way to get a one-way bijective hash is to use plain old CRC. It is reversible for 32 / 64 bits, but I've never seen the code to do the reverse.

Upvotes: 0

PermanentGuest
PermanentGuest

Reputation: 5331

For a related question I had posted an answer. In your case, you would need an additional mathematical function to map your number to a different number(maybe using a simple hash function) and use encode this number to a base 36 number.

Upvotes: 1

Related Questions