Joe
Joe

Reputation: 47609

Storing integers in a redis ordered set?

I have a system which deals with keys that have been turned into unsigned long integers (by packing short sequences into byte strings). I want to try storing these in Redis, and I want to do it in the best way possible. My concern is mainly memory efficiency.

From playing with the online REPL I notice that the two following are identical

zadd myset 1.0 "123"

zadd myset 1.0 123

This means that even if I know I want to store an integer, it has to be set as a string. I notice from the documentation that keys are just stored as char*s and that commands like SETBIT indicate that Redis is not averse to treating strings as bytestrings in the client. This hints at a slightly more efficient way of storing unsigned longs than as their string representation.

What is the best way to store unsigned longs in sorted sets?

Upvotes: 15

Views: 9900

Answers (2)

Joe
Joe

Reputation: 47609

Thanks to Andre for his answer. Here are my findings.

Storing ints directly

Redis keys must be strings. If you want to pass an integer, it has to be some kind of string. For small, well-defined sets of values, Redis will parse the string into an integer, if it is one. My guess is that it will use this int to tailor its hash function (or even statically dimension a hash table based on the value). This works for small values (examples being the default values of 64 entries of a value of up to 512). I will test for larger values during my investigation.

http://redis.io/topics/memory-optimization

Storing as strings

The alternative is squashing the integer so it looks like a string.

It looks like it is possible to use any byte string as a key.

For my application's case it actually didn't make that much difference storing the strings or the integers. I imagine that the structure in Redis undergoes some kind of alignment anyway, so there may be some pre-wasted bytes anyway. The value is hashed in any case.

Using Python for my testing, so I was able to create the values using the struct.pack. long longs weigh in at 8 bytes, which is quite large. Given the distribution of integer values, I discovered that it could actually be advantageous to store the strings, especially when coded in hex.

As redis strings are "Pascal-style":

struct sdshdr {
    long len;
    long free;
    char buf[];
};

and given that we can store anything in there, I did a bit of extra Python to code the type into the shortest possible type:

def do_pack(prefix, number):
    """
    Pack the number into the best possible string. With a prefix char.
    """ 

    # char
    if number < (1 << 8*1):
        return pack("!cB", prefix, number)

    # ushort
    elif number < (1 << 8*2):
        return pack("!cH", prefix, number)

    # uint
    elif number < (1 << 8*4):
        return pack("!cI", prefix, number)

    # ulonglong
    elif number < (1 << 8*8):
        return pack("!cQ", prefix, number)

This appears to make an insignificant saving (or none at all). Probably due to struct padding in Redis. This also drives Python CPU through the roof, making it somewhat unattractive.

The data I was working with was 200000 zsets of consecutive integer => (weight, random integer) × 100, plus some inverted index (based on random data). dbsize yields 1,200,001 keys.

Final memory use of server: 1.28 GB RAM, 1.32 Virtual. Various tweaks made a difference of no more than 10 megabytes either way.

So my conclusion:

Don't bother encoding into fixed-size data types. Just store the integer as a string, in hex if you want. It won't make all that much difference.

References:

http://docs.python.org/library/struct.html

http://redis.io/topics/internals-sds

Upvotes: 12

Andre
Andre

Reputation: 3181

I'm not sure of this answer, it's more of a suggestion than anything else. I'd have to give it a try and see if it works.

As far as I can tell, Redis only supports UTF-8 strings.

I would suggest grabbing a bit representation of your long integer and pad it accordingly to fill up the nearest byte. Encode each set of 8 bytes to a UTF-8 string (ending up with 8x*utf8_char* string) and store that in Redis. The fact that they're unsigned means that you don't care about that first bit but if you did, you could add a flag to the string.

Upon retrieving the data, you have to remember to pad each character to 8 bytes again as UTF-8 will use less bytes for the representation if the character can be stored with less bytes.

End result is that you store a maximum of 8 x 8 byte characters instead of (possibly) a maximum of 64 x 8 byte characters.

Upvotes: 3

Related Questions