Reputation: 161
I'm trying to use a bitmask to provide as many binary values as possible so that the final value will store in the limited allocate memory for a string. My current methodology is to find a maximum number and convert it to a string base-36.
value = (0 | (1<<1318)).to_s(36)
The result is 255 chars of a compressed number from which I can extract my original number of 1318. The downside is I'm limited to 1,318 binary values and I want to expand that number. Are there any alternative strategies in Ruby to compress this number even further?
Upvotes: 1
Views: 289
Reputation: 110725
Numbers are non-negative
If the numbers are non-negative we can encode each 8-bits of each number to a character that is part of a string, and then decode the string by converting each character to 8 bits of the number.
def encode(n)
str = ''
until n.zero?
str << (n & 255).chr
n = n >> 8
end
str.reverse
end
def decode(str)
str.each_char.reduce(0) { |n,c| (n << 8) | c.ord }
end
This uses the following bit-manipulation methods in the class Integer: &
, >>
, <<
and |
.
def test(n)
encoded = encode(n)
puts "#{n} => #{encoded} => #{decode(encoded)}"
end
test 1 # 1 => ["\u0001"] => 1
test 63 # 63 => ["?"] => 63
test 64 # 64 => ["@"] => 64
test 255 # 255 => ["\xFF"] => 255
test 256 # 256 => ["\u0001", "\u0000"] => 256
test 123456 # 123456 => ["\x01", "\xE2", "@"] => 123456
For example,
n = 123456
n.to_s(2)
#=> "11110001001000000"
so
n = 0b11110001001000000
#=> 123456
The bytes of this number can be visualized so:
00000001 11100010 01000000
We see that
a = [0b00000001, 0b11100010, 0b01000000]
a.map(&:chr)
#=> ["\x01", "\xE2", "@"]
Numbers can be negative
If the numbers to be encoded can be negative we need to first convert then to their absolute values then add some information to the encoded string that indicates whether they are non-negative or negative. That will require at least one additional byte so we might include a "+"
for non-negative numbers and a "-"
for negative numbers.
def encode(n)
sign = "+"
if n < 0
sign = "-"
n = -n
end
str = ''
until n.zero?
str << (n & 255).chr
n = n >> 8
end
(str << sign).reverse
end
def decode(str)
n = str[1..-1].each_char.reduce(0) { |n,c| (n << 8) | c.ord }
str[0] == '+' ? n : -n
end
test -255 # -255 => ["-", "\xFF"] => -255
test -256 # -256 => ["-", "\u0001", "\u0000"] => -256
test -123456 # -123456 => ["-", "\x01", "\xE2", "@"] => -123456
test 123456 # 123456 => ["+", "\x01", "\xE2", "@"] => 123456
Upvotes: 1
Reputation: 484
You can always encode your number into base s
and then represent that as string with whatever alphabet you want.
def encode(n, alphabet)
s = alphabet.size
res = []
while (n > 0)
res << n % s
n = n / s
end
res.reverse.map { |i| alphabet[i] }.join
end
Your method is then equivalent to encode(n, alphabet)
, where alphabet
is defined as
alphabet = ((0..9).to_a + ("a".."z").to_a).join
# => "0123456789abcdefghijklmnopqrstuvwxyz"
But you might as well use all possible characters instead of only 36 of them:
extended_alphabet = (0..255).map { |i| i.chr }.join
This gives a total of (256 ** 255) possibilities, i.e. up to (2 ** 2040), which is much better than your actual (2 ** 1318).
This encoding happens to be optimal because each character of your string can have at most 256 different values, and all of them are used here.
Decoding can then be performed as follows:
def decode(encoded, alphabet)
s = alphabet.size
n = 0
decode_dict = {}; i = -1
alphabet.each_char { |c| decode_dict[c] = (i += 1) }
encoded.each_char do |c|
n = n * s + decode_dict[c]
end
n
end
If you are going to use a fixed alphabet for all your encodings, I would suggest computing the decoding dictionnary outside of the function and taking it as a parameter instead of alphabet
, to avoid computing it every time you try to encode a number.
Upvotes: 1