Reputation: 6252
I have a key that is generated by concating from many different elements.:
[15,000 unique strings] + [:] + [5 unique strings] + [:] + [1 or 0] + [:] + [15,000 unique strings] + [:] + [5 unique strings] + [:] + [1 or 0] = A String which will be between 20 and 50 characters long (eg: Vancouver:temp:1:Kelowna:high:0)
According to my calculations there will be around 1 billion combinations, each which will be a key. Reading redis documentation (http://redis.io/topics/memory-optimization) they reccomend you hash the keys : eg. "object:11558960" => "1" can become "object:1155" "8960" => "1".
I am thinking of the best way to apply memory optimization. My first idea is to create a numeric representation for the strings. So I would use MySQL and create look-up table where each string would have a corresponding numeric integer. This way I could hash more appropriatly as I could divide numbers more easily then I could String. As well, the numbers would create shorter keys which I think would save memory.The problem here is with 1 billion keys, that is a lot of overhead with MySQL as I would have to create joins and all that.
Another solution I read about is to take my string I create and then compress it using something like php's gzcompres before inserting into the redis. (http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/).
Is there any best practice optimizations I could use to lower my redis memory consumption as currently it is still too high? I am willing to give up CPU power for saving more memory. My values will only be single or double digit integers from 0-50.
Upvotes: 3
Views: 2356
Reputation: 132068
The lookup table is totally out, don't even bother. The hash solution seems like it could be well suited for your needs. You would want your key to split right before the 15,000 unique stings to give you enough hash keys to make it worth the effort.
So instead of:
SET Vancouver:temp:1:Kelowna:high:0 10
You would use
HSET Vancouver:temp:1 Kelowna:high:0 10
Now everything after the first [1 or 0] would be a hash key, so you would have approx 150,000 possible keys per hash.
My calculations for your total key space is a bit off from yours:
15000 * 5 * 2 * 15000 * 5 * 2 == 22500000000 (22.5 billion)
So doing it this way you would have 150,000 possible keys (redis keys) with 150,000 possible hash keys each.
The further left you make the break between redis key and hash key the more the numbers skew for hash keys. For instance, if you broke it up like
HSET Vancouver:temp 1:Kelowna:high:0 10
Then you would have 75,000 redis keys for hashes, with each hash possibly holding 300,000 key/value pairs.
Another way you could do it is by using an integer value for your key. If you had integer mappings for each of your two sets of 15,000 unique strings and 5 unique strings, then you could use a total of 34 bits to represent any key. For example.
0000000000000 000 0 0000000000000 000 0
| 13 | | 3 | |1| | 13 | | 3 | |1|
The 13 bits give you a range of 0-16383 (which covers the 1-15,000 required) The 3 bits give you a range o f 0-7 (which covers the 1-5 required) And the 1 bit gives you the binary 1 or 0 range you need.
So assuming these made up values: Vancuver == 9,987 temp == 3 Kelowna == 3,454 high = 2
You would have:
(9987 << 21) + (3 << 18) + (1 << 17) + (3454 << 4) + (2 << 1) + (0 << 0)
==
20945229796
To get the values back from a given key you just bit-shift and mask
20945229796 >> 20
9987
(20945229796 >> 4) & ((1 << 13) - 1)
3454
Here is a simple python script that converts values to an int, and ints to the values:
values = [9987, 3, 1, 3454, 2, 0]
bits = [21, 18, 17, 4, 1, 0]
value_and_shift = zip(values, bits)
def key_from_values(values_and_shift):
return sum(x << y for x, y in value_and_shift)
def extract_values(values_and_shift):
last_shift = 35
for value, shift in value_and_shift:
print "Value should be:", value
print "Value extracted:", (key >> shift) & ((1 << (last_shift - shift)) - 1)
print
last_shift = shift
key = key_from_values(value_and_shift)
print "Using value of:", key
extract_values(value_and_shift)
OUTPUT
Using value of: 20945229796
Value should be: 9987
Value extracted: 9987
Value should be: 3
Value extracted: 3
Value should be: 1
Value extracted: 1
Value should be: 3454
Value extracted: 3454
Value should be: 2
Value extracted: 2
Value should be: 0
Value extracted: 0
Upvotes: 3