Reputation: 5363
var ShortURL = new function() {
var _alphabet = '23456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ-_',
_base = _alphabet.length;
this.encode = function(num) {
var str = '';
while (num > 0) {
str = _alphabet.charAt(num % _base) + str;
num = Math.floor(num / _base);
}
return str;
};
this.decode = function(str) {
var num = 0;
for (var i = 0; i < str.length; i++) {
num = num * _base + _alphabet.indexOf(str.charAt(i));
}
return num;
};
};
I understand encode works by converting from decimal to custom base (custom alphabet/numbers in this case)
I am not quite sure how decode works. Why do we multiply base by a current number and then add the position number of the alphabet? I know that to convert 010 base 2 to decimal, we would do
(2 * 0^2) + (2 * 1^1) + (2 * 0 ^ 0) = 2
Not sure how it is represented in that decode algorithm
EDIT: My own decode version
this.decode2 = function (str) {
var result = 0;
var position = str.length - 1;
var value;
for (var i = 0; i < str.length; i++) {
value = _alphabet.indexOf(str[i]);
result += value * Math.pow(_base, position--);
}
return result;
}
This is how I wrote my own decode version (Just like I want convert this on paper. I would like someone to explain more in detail how the first version of decode works. Still don't get why we multiply num * base and start num with 0.
Upvotes: 3
Views: 255
Reputation: 4731
You ask
I would like to understand how the decode function works from logical perspective. Why are we using num * base and starting with num = 0.
and write that
I am not quite sure how decode works. Why do we multiply base by a current number and then add the position number of the alphabet? I know that to convert 010 base 2 to decimal, we would do
(2 * 0^2) + (2 * 1^1) + (2 * 0 ^ 0) = 2
The decode function uses an approach to base conversion known as Horner's rule, used because it is computationally efficient:
num = 0
num
by the basenum
,num
now contains the converted value (in base 10)Using an example of a hexadecimal number A5D
:
num = 0
num
is now still 0A
has a digit value of 10
) and add it to num
, num
is now 10num
by the base (16), num
is now 1605
to num
, num
is now 165num
by the base (16), num
is now 2640D
to num
(add 13)num
now contains the converted value (in base 10), which is 2653Compare the expression of the standard approach:
(10 × 162) + (5 × 161) + (13 × 160) = 2653
to the use of Horner's rule:
(((10 × 16) + 5) × 16) + 13 = 2653
which is exactly the same computation, but rearranged in a form making it easier to compute. This is how the decode
function works.
Why are we using num * base and starting with num = 0.
The conversion algorithm needs a start value, therefore num
is set to 0. For each repetition (each loop iteration), num
is multiplied by base
. This only has any effect on the second iteration, but is written like this to make it easier to write the conversion as a for
loop.
Upvotes: 1
Reputation: 350300
Both the original and your own version of the decode
function achieve the same thing, but the original version does it more efficiently.
In the following assignment:
num = num * _base + _alphabet.indexOf(str.charAt(i));
... there are two parts:
_alphabet.indexOf(str.charAt(i))
The indexOf
returns the value of a digit in base _base
. You have this part in your own algorithm, so that should be clear.
num * _base
This multiplies the so-far accumulated result. The rest of my answer is about that part:
In the first iteration this has no effect, as num
is still 0 at that point. But at the end of the first iteration, num
contains the value as if the str
only had its left most character. It is the base-51 digit value of the left most digit.
From the next iteration onwards, the result is multiplied by the base, which makes room for the next value to be added to it. It functions like a digit shift.
Take this example input to decode
:
bd35
The individual characters represent value 8, 10, 1 and 3. As there are 51 characters in the alphabet, we're in base 51. So bd35
this represents value:
8*51³ + 10*51² + 1*51 + 3
Here is a table with the value of num
after each iteration:
8
8*51 + 10
8*51² + 10*51 + 1
8*51³ + 10*51² + 1*51 + 3
Just to make the visualisation cleaner, let's put the power of 51 in a column header, and remove that from the rows:
3 2 1 0
----------------------------
8
8 10
8 10 1
8 10 1 3
Note how the 8 shifts to the left at each iteration and gets multiplied with the base (51). The same happens with 10, as soon as it is shifted in from the right, and the same with the 1, and 3, although that is the last one and doesn't shift any more.
The multiplication num * _base
represents thus a shift of base-digits to the left, making room for a new digit to shift in from the right (through simple addition).
At the last iteration all digits have shifted in their correct position, i.e. they have been multiplied by the base just enough times.
Putting your own algorithm in the same scheme, you'd have this table:
3 2 1 0
----------------------------
8
8 10
8 10 1
8 10 1 3
Here, there is no shifting: the digits are immediately put in the right position, i.e. they are multiplied with the correct power of 51 immediately.
Upvotes: 1
Reputation: 413737
OK, so what does 376
mean as a base-10 output of your encode()
function? It means:
1 * 100
+5 * 10
+4 * 1
Why? Because in encode()
, you divide by the base on every iteration. That means that, implicitly, the characters pushed onto the string on the earlier iterations gain in significance by a factor of the base each time through the loop.
The decode()
function, therefore, multiplies by the base each time it sees a new character. That way, the first digit is multiplied by the base once for every digit position past the first that it represents, and so on for the rest of the digits.
Note that in the explanation above, the 1
, 5
, and 4
come from the positions of the characters 3
, 7
, and 6
in the "alphabet" list. That's how your encoding/decoding mechanism works. If you feed your decode()
function a numeric string encoded by something trying to produce normal base-10 numbers, then of course you'll get a weird result; that's probably obvious.
edit To further elaborate on the decode()
function: forget (for now) about the special base and encoding alphabet. The process is basically the same regardless of the base involved. So, let's look at a function that interprets a base-10 string of numeric digits as a number:
function decode10(str) {
var num = 0, zero = '0'.charCodeAt(0);
for (var i = 0; i < str.length; ++i) {
num = (num * 10) + (str[i] - zero);
}
return num;
}
The accumulator variable num
is initialized to 0 first, because before examining any characters of the input numeric string the only value that makes sense to start with is 0.
The function then iterates through each character of the input string from left to right. On each iteration, the accumulator is multiplied by the base, and the digit value at the current string position is added.
If the input string is "214", then, the iteration will proceed as follows:
num
is set to 0str[i]
is 2
, so (num * 10) + 2
is 2
str[i]
is 1
, so (num * 10) + 1
is 21
str[i]
is 4
, so (num * 10) + 4
is 214
The successive multiplications by 10 achieve what the call to Math.pow()
does in your code. Note that 2
is multiplied by 10 twice, which effectively multiplies it by 100.
The decode()
routine in your original code does the same thing, only instead of a simple character code computation to get the numeric value of a digit, it performs a lookup in the alphabet string.
Upvotes: 3