redis reduce memory consumption of string keys of 20-50 character length

Question

I have a key that is generated by concating from many different elements.:

[15,000 unique strings] + [:] + [5 unique strings] + [:] + [1 or 0] + [:] + [15,000 unique strings] + [:] + [5 unique strings] + [:] + [1 or 0] = A String which will be between 20 and 50 characters long (eg: Vancouver:temp:1:Kelowna:high:0)

According to my calculations there will be around 1 billion combinations, each which will be a key. Reading redis documentation (http://redis.io/topics/memory-optimization) they reccomend you hash the keys : eg. "object:11558960" => "1" can become "object:1155" "8960" => "1".

I am thinking of the best way to apply memory optimization. My first idea is to create a numeric representation for the strings. So I would use MySQL and create look-up table where each string would have a corresponding numeric integer. This way I could hash more appropriatly as I could divide numbers more easily then I could String. As well, the numbers would create shorter keys which I think would save memory.The problem here is with 1 billion keys, that is a lot of overhead with MySQL as I would have to create joins and all that.

Another solution I read about is to take my string I create and then compress it using something like php's gzcompres before inserting into the redis. (http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/).

Is there any best practice optimizations I could use to lower my redis memory consumption as currently it is still too high? I am willing to give up CPU power for saving more memory. My values will only be single or double digit integers from 0-50.

sberry · Accepted Answer

The lookup table is totally out, don't even bother. The hash solution seems like it could be well suited for your needs. You would want your key to split right before the 15,000 unique stings to give you enough hash keys to make it worth the effort.

So instead of:

SET Vancouver:temp:1:Kelowna:high:0 10

You would use

HSET Vancouver:temp:1 Kelowna:high:0 10

Now everything after the first [1 or 0] would be a hash key, so you would have approx 150,000 possible keys per hash.

My calculations for your total key space is a bit off from yours:

15000 * 5 * 2 * 15000 * 5 * 2 == 22500000000 (22.5 billion)

So doing it this way you would have 150,000 possible keys (redis keys) with 150,000 possible hash keys each.

The further left you make the break between redis key and hash key the more the numbers skew for hash keys. For instance, if you broke it up like

HSET Vancouver:temp 1:Kelowna:high:0 10

Then you would have 75,000 redis keys for hashes, with each hash possibly holding 300,000 key/value pairs.

Another way you could do it is by using an integer value for your key. If you had integer mappings for each of your two sets of 15,000 unique strings and 5 unique strings, then you could use a total of 34 bits to represent any key. For example.

 0000000000000   000   0   0000000000000   000   0
|      13     | | 3 | |1| |     13      | | 3 | |1|

The 13 bits give you a range of 0-16383 (which covers the 1-15,000 required) The 3 bits give you a range o f 0-7 (which covers the 1-5 required) And the 1 bit gives you the binary 1 or 0 range you need.

So assuming these made up values: Vancuver == 9,987 temp == 3 Kelowna == 3,454 high = 2

You would have:

(9987 << 21) + (3 << 18) + (1 << 17) + (3454 << 4) + (2 << 1) + (0 << 0)
==
20945229796

To get the values back from a given key you just bit-shift and mask

20945229796 >> 20
9987

(20945229796 >> 4) & ((1 << 13) - 1)
3454

Here is a simple python script that converts values to an int, and ints to the values:

values = [9987, 3, 1, 3454, 2, 0]
bits =   [21, 18, 17, 4, 1, 0]

value_and_shift = zip(values, bits)


def key_from_values(values_and_shift):
    return sum(x << y for x, y in value_and_shift)

def extract_values(values_and_shift):
    last_shift = 35
    for value, shift in value_and_shift:
        print "Value should be:", value
        print "Value extracted:", (key >> shift) & ((1 << (last_shift - shift)) - 1)
        print
        last_shift = shift

key = key_from_values(value_and_shift)
print "Using value of:", key

extract_values(value_and_shift)

OUTPUT

Using value of: 20945229796

Value should be: 9987
Value extracted: 9987

Value should be: 3
Value extracted: 3

Value should be: 1
Value extracted: 1

Value should be: 3454
Value extracted: 3454

Value should be: 2
Value extracted: 2

Value should be: 0
Value extracted: 0

redis reduce memory consumption of string keys of 20-50 character length

Answers (1)

Related Questions