Reputation: 409
I have a string of numbers that i would like to make shorter for use in a URL. This string always consists of numbers only. For example: 9587661771112
In theory, encrypting a numeric string into an alphanumeric(0-9a-zA-Z) string should always return a shorter result, which is what i want.
I've created an algorithm that does the following:
Encrypt ( string1 = numeric input string, string2 = alphanumeric return string)
- Takes the next two characters from string1 and converts them into a number, e.g 95 for the above example
- Checks if the number is less than 52(the combined length of a-z and A-Z)
- if so, adds ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")[Number] to string2 and jump forward by 2 characters
- else, add ("0123456789)[First digit of Number) to string2 and jump forward by 1 character
In the next step the number would be 58 and so on.
With some tweaking the shortest result i could get was: 9587661771112 > j9UQpjva
My problem is that with this technique, the result can vary dramaticaly. I also feel that this is not a clean solution to my problem.
So I need an encryption algorithm that converts a string of numbers into a shorter string of uppercase letters, lowercase letters and numbers. It has to be decryptable and have a more or less consistent result.
Any idea how to achieve this?
Solution:
string Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
string Base10To62(long N)
{
string R = "";
while (N != 0)
{
R += Chars[(int)(N % 62)];
N /= 62;
}
return R;
}
long Base62To10(string N)
{
long R = 0;
int L = N.Length;
for (int i = 0; i < L; i++)
{
R += Chars.IndexOf(N[i]) * (long)Math.Pow(62, i);
}
return R;
}
works like a charm :)
Upvotes: 4
Views: 6825
Reputation: 409
Solution:
string Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static string Base10To62(string S)
{
string R = "";
var N = long.Parse(S);
do { R += Chars[(int)(N % 0x3E)]; } while ((N /= 0x3E) != 0);
return R;
}
private static string Base62To10(string S)
{
long R = 0;
int L = S.Length;
for (int i = 0; i < L; i++) R += Chars.IndexOf(S[i]) * (long)(System.Math.Pow(0x3E, i));
return R.ToString();
}
Upvotes: 2
Reputation: 112209
If you can add two more characters to your set to make it a nice even 64, then there is a simple, fast algorithm I can describe here.
Encode the digits into a three or four-bit code as follows:
0: 000
1: 001
2: 010
3: 011
4: 100
5: 101
6: 1100
7: 1101
8: 1110
9: 1111
This is a prefix code, which means that you can look at the first three bits to tell if you need to use a fourth. If the first three bits as an integer are greater than 5, then get another bit. So the decoding would be:
get three bits as n
if n < 6
the result is n + '0'
else
n = (n << 1) + one more bit
the result is n - 6 + '0'
The bits are then simply stored six at a time in one of the 64 allowed characters.
This has a problem if you do not know a priori how many digits there are, since there will be ambiguity if you leave four or five bits unused in the last character. In that case, the code can be changed simply to:
0: 000
1: 001
2: 010
3: 011
4: 100
5: 1010
6: 1011
7: 1100
8: 1101
9: 1110
eom: 1111
which takes a few more bits, but provides an unambiguous end-of-message marker.
For the first example, you will store on average 1.76 digits per character. For the second example, 1.71 digits per character, less some amount for the eom marker depending on the number of digits you encode at one time.
If you really can only use 62 characters, then I'll need to think about it a little more.
Update:
A quick look at RFC 1738 indicates that many more characters can be used in a URL:
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
"i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
"q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
"y" | "z"
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
alpha = lowalpha | hialpha
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
safe = "$" | "-" | "_" | "." | "+"
extra = "!" | "*" | "'" | "(" | ")" | ","
unreserved = alpha | digit | safe | extra
So adding, say, $ and _ to your set would make it 64.
Upvotes: 1
Reputation: 10222
Linq version for 62 to 10, just for fun:
long Base62To10(string N)
{
return N.Select((t, i) => Chars.IndexOf(t)*(long) Math.Pow(62, i)).Sum();
}
Upvotes: 1