mburke13
mburke13

Reputation: 500

Encoding a 128-bit integer in Python?

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.

Here is what it looks like:

import random
from math import pow
from string import ascii_letters, digits

# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)

# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)

dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())

def encode(n):
    d = []
    n = n
    while n > 0:
        qr = divmod(n, size)
        n = qr[0]
        d.append(qr[1])
    chars = ''
    for i in d:
        chars += dictionary[i]
    return chars

def decode(str):
    d = []
    for c in str:
        d.append(reverse_dictionary[c])
    value = 0
    for i in range(0, len(d)):
        value += d[i] * pow(size, i)
    return value

The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:

s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"

The highest 16 places match up perfectly, but after those the number deviates from its original.

I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?

Upvotes: 3

Views: 14961

Answers (1)

Niklas B.
Niklas B.

Reputation: 95318

The problem lies within this line:

value += d[i] * pow(size, i)

It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:

def decode(s):
    d = [reverse_dictionary[c] for c in s]
    result, power = 0, 1
    for x in d:
        result += x * power
        power *= size
    return result

It gives me the following result now:

print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

Upvotes: 4

Related Questions