Reputation: 2065
Let's say there is a certain way of encrypting strings:
For example, the word FRUIT is encrypted in the following manner:
We append the character $ at the end of the word:
FRUIT$
We then form all the strings by moving the first character at the end:
FRUIT$
RUIT$S
UIT$FR
IT$FRU
T$FRUI
$FRUIT
Then we sort the new strings into alphabetical order:
$FRUIT
FRUIT$
IT$FRU
RUIT$F
T$FRUI
UIT$FR
The encrypted string:
T$UFIR
Now my problem is obvious: How to decrypt a given string into it's original form.
I've been pounding my head for half a week now and I've finally run out of paper.
How should I get on with this?
What I have discovered:
if we have the last step of the encryption:
$FRUIT
FRUIT$
IT$FRU
RUIT$F
T$FRUI
UIT$FR
We can know the first and last character of the original string, since the rightmost column is the encrypted string itself, and the leftmost column is always in alphabetical order. The last character is the first character of the encrypted string, because $ is always first in the alphabet, and it only exists once in a string. Then, if we find the $ character from the rightmost column, and look up the character on the same row in the leftmost column, we get the first character.
So what we can know about the encrypted string T$UFIR is that the original string is F***T$, where * is an unknown character.
There ends my ideas. Now I have to utilize the world-wide-web and ask another human being: How?
You could say this is homework, and being familiar with my tutor, I place my bets on this being a dynamic programming -problem.
Upvotes: 3
Views: 1596
Reputation: 16582
This is the Burrows-Wheeler transform.
It's an algorithm typically used for aiding compression algorithms, as it tends to group together common repeating phrases, and is reversible.
To decode your string:
Number each character:
T$UFIR
012345
Now sort, retaining the numbering. If characters repeat, you use the indices as a secondary sort-key, such that the indices for the repeated characters are kept in increasing order, or otherwise use a sorting algorithm that guarantees this.
$FIRTU
134502
Now we can decode. Start at the '$', and use the associated index as the next character to output ('$' = 1, so the next char is 'F'. 'F' is 3, so the next char is 'R', etc...)
The result:
$FRUIT
So just remove the marker character, and you're done.
Upvotes: 15