Leonel
Leonel

Reputation: 2866

Transform a word into a unique identifier

My goal is to transform a word into a unique identifier (Number).

"WORD" => X(for instance)

But the Mathematical (or another approach) formula to be applied must be determinist, transitive, and the character order matters

For instance, let's define transform as the name of the Mathematical function like:

tranform("WORD") => X

Rules/Goals:

  1. Must be deterministic:

    if transform("WORD") => X then the transform formula must always return X. So the transform formula cannot be the "Memory address of variables" for instance.

  2. Transitive (Optional, It will be a nice to have for performance purposes)

    if tranform("WORD") => X then tranform("WORDS") => Y where Y > X.

  3. The order matters

    tranform("WORD") must be different than tranform("ROWD")

Any Idea/Approach?

I have already try the folowing approach but it's not correct:

  1. Ascii character code

    transform("WORD") = 87+79+82+68 = 316

    transform("ROWD") = 82+79+87+68 = 316

    So the rule 3 fails

Upvotes: 2

Views: 163

Answers (3)

asaad
asaad

Reputation: 441

Transforming the string to a base-n number is definitely the key (as described in the other answer). Here is how you can represent your transform function output:

I assume your transformation should be in the form of a decimal digit sequence. Also I assume that your characters are in the range of ASCII (128 total possibilities)

The key is to assign 3 digits for each of the letters in your initial string. The value of each three-digit position is set to the index of the letter (or character) from 0 to 127.

The transform function works like this:

transform(C1C2C3) = index[C1]index[C2]index[C3]

Where index returns a three digit sequence (i.e. 001, 020 or 120) that shows the position of the given character Cx. The results are concatenated to each other (not to be accumulated).

A sample would be:

transform("WORD") = 087079082068

Showing that the transform function covers all your rules is easy now.

Upvotes: 1

Chet
Chet

Reputation: 22065

It sounds like you're trying to generate a hashcode of sorts. Java's String implementation of hashCode() is fairly straightforward, so it wouldn't be difficult to implement something similar in another language.

The algorithm from the Java API Docs:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

s[i] is the ith character of the string, n is the length of the string

Upvotes: 1

orlp
orlp

Reputation: 117731

Assuming ASCII, you can simply interpret every character as a number on [0, 128).

Then your words simply are numbers in base-128.

Your transitive requirement is underspecified, but this approach is at least transitive in word length.

Upvotes: 4

Related Questions