Reputation: 5363

Can someone explain this base conversion code

var ShortURL = new function() {

    var _alphabet = '23456789bcdfghjkmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ-_',
        _base = _alphabet.length;
    this.encode = function(num) {
        var str = '';
        while (num > 0) {
            str = _alphabet.charAt(num % _base) + str;
            num = Math.floor(num / _base);
        }
        return str;
    };

    this.decode = function(str) {
        var num = 0;
        for (var i = 0; i < str.length; i++) {
            num = num * _base + _alphabet.indexOf(str.charAt(i));
        }
        return num;
    };

};

I understand encode works by converting from decimal to custom base (custom alphabet/numbers in this case)

I am not quite sure how decode works. Why do we multiply base by a current number and then add the position number of the alphabet? I know that to convert 010 base 2 to decimal, we would do

(2 * 0^2) + (2 * 1^1) + (2 * 0 ^ 0) = 2

Not sure how it is represented in that decode algorithm

EDIT: My own decode version

this.decode2 = function (str) {
    var result = 0;
    var position = str.length - 1;
    var value;
    for (var i = 0; i < str.length; i++) {
      value = _alphabet.indexOf(str[i]);
      result += value * Math.pow(_base, position--);
    }
    return result;
  }

This is how I wrote my own decode version (Just like I want convert this on paper. I would like someone to explain more in detail how the first version of decode works. Still don't get why we multiply num * base and start num with 0.

Upvotes: 3

Answers (3)

Tomas Langkaas

Reputation: 4731

You ask

I would like to understand how the decode function works from logical perspective. Why are we using num * base and starting with num = 0.

and write that

I am not quite sure how decode works. Why do we multiply base by a current number and then add the position number of the alphabet? I know that to convert 010 base 2 to decimal, we would do

(2 * 0^2) + (2 * 1^1) + (2 * 0 ^ 0) = 2

The decode function uses an approach to base conversion known as Horner's rule, used because it is computationally efficient:

start with a variable set to 0, num = 0
multiply the variable num by the base
take the value of the most significant digit (the leftmost digit) and add it to num,
repeat step 2 and 3 for as long as there are digits left to convert,
the variable num now contains the converted value (in base 10)

Using an example of a hexadecimal number A5D:

start with a variable set to 0, num = 0
multiply by the base (16), num is now still 0
take the value of the most significant digit (the A has a digit value of 10) and add it to num, num is now 10
repeat step 2, multiply the variable num by the base (16), num is now 160
repeat step 3, add the hexadecimal digit 5 to num, num is now 165
repeat step 2, multiply the variable num by the base (16), num is now 2640
repeat step 3, add the hexadecimal digit D to num (add 13)
there are no digits left to convert, the variable num now contains the converted value (in base 10), which is 2653

Compare the expression of the standard approach:

(10 × 16²) + (5 × 16¹) + (13 × 16⁰) = 2653

to the use of Horner's rule:

(((10 × 16) + 5) × 16) + 13 = 2653

which is exactly the same computation, but rearranged in a form making it easier to compute. This is how the decode function works.

Why are we using num * base and starting with num = 0.

The conversion algorithm needs a start value, therefore num is set to 0. For each repetition (each loop iteration), num is multiplied by base. This only has any effect on the second iteration, but is written like this to make it easier to write the conversion as a for loop.

Upvotes: 1

trincot

Reputation: 350300

Both the original and your own version of the decode function achieve the same thing, but the original version does it more efficiently.

In the following assignment:

num = num * _base + _alphabet.indexOf(str.charAt(i));

... there are two parts:

_alphabet.indexOf(str.charAt(i))

The indexOf returns the value of a digit in base _base. You have this part in your own algorithm, so that should be clear.
num * _base

This multiplies the so-far accumulated result. The rest of my answer is about that part:

In the first iteration this has no effect, as num is still 0 at that point. But at the end of the first iteration, num contains the value as if the str only had its left most character. It is the base-51 digit value of the left most digit.

From the next iteration onwards, the result is multiplied by the base, which makes room for the next value to be added to it. It functions like a digit shift.

Take this example input to decode:

bd35

The individual characters represent value 8, 10, 1 and 3. As there are 51 characters in the alphabet, we're in base 51. So bd35 this represents value:

8*51³ + 10*51² +  1*51  +  3

Here is a table with the value of num after each iteration:

                           8
                  8*51  + 10
         8*51² + 10*51  +  1
8*51³ + 10*51² +  1*51  +  3

Just to make the visualisation cleaner, let's put the power of 51 in a column header, and remove that from the rows:

3        2        1        0
----------------------------
                           8
                  8       10
         8       10        1
8       10        1        3

Note how the 8 shifts to the left at each iteration and gets multiplied with the base (51). The same happens with 10, as soon as it is shifted in from the right, and the same with the 1, and 3, although that is the last one and doesn't shift any more.

The multiplication num * _base represents thus a shift of base-digits to the left, making room for a new digit to shift in from the right (through simple addition).

At the last iteration all digits have shifted in their correct position, i.e. they have been multiplied by the base just enough times.

Putting your own algorithm in the same scheme, you'd have this table:

    3        2        1        0
    ----------------------------
    8
    8       10 
    8       10        1
    8       10        1        3

Here, there is no shifting: the digits are immediately put in the right position, i.e. they are multiplied with the correct power of 51 immediately.

Upvotes: 1

Pointy

Reputation: 413737

OK, so what does 376 mean as a base-10 output of your encode() function? It means:

1 * 100 +
5 * 10 +
4 * 1

Why? Because in encode(), you divide by the base on every iteration. That means that, implicitly, the characters pushed onto the string on the earlier iterations gain in significance by a factor of the base each time through the loop.

The decode() function, therefore, multiplies by the base each time it sees a new character. That way, the first digit is multiplied by the base once for every digit position past the first that it represents, and so on for the rest of the digits.

Note that in the explanation above, the 1, 5, and 4 come from the positions of the characters 3, 7, and 6 in the "alphabet" list. That's how your encoding/decoding mechanism works. If you feed your decode() function a numeric string encoded by something trying to produce normal base-10 numbers, then of course you'll get a weird result; that's probably obvious.

edit To further elaborate on the decode() function: forget (for now) about the special base and encoding alphabet. The process is basically the same regardless of the base involved. So, let's look at a function that interprets a base-10 string of numeric digits as a number:

function decode10(str) {
  var num = 0, zero = '0'.charCodeAt(0);
  for (var i = 0; i < str.length; ++i) {
    num = (num * 10) + (str[i] - zero);
  }
  return num;
}

The accumulator variable num is initialized to 0 first, because before examining any characters of the input numeric string the only value that makes sense to start with is 0.

The function then iterates through each character of the input string from left to right. On each iteration, the accumulator is multiplied by the base, and the digit value at the current string position is added.

If the input string is "214", then, the iteration will proceed as follows:

num is set to 0
First iteration: str[i] is 2, so (num * 10) + 2 is 2
Second iteration: str[i] is 1, so (num * 10) + 1 is 21
Third iteration: str[i] is 4, so (num * 10) + 4 is 214

The successive multiplications by 10 achieve what the call to Math.pow() does in your code. Note that 2 is multiplied by 10 twice, which effectively multiplies it by 100.

The decode() routine in your original code does the same thing, only instead of a simple character code computation to get the numeric value of a digit, it performs a lookup in the alphabet string.

Upvotes: 3

Can someone explain this base conversion code

Answers (3)

Related Questions