Suyog Kale
Suyog Kale

Reputation: 373

What is better datastructure to store user profiles in redis?

I want to store user's profiles in redis, as I have to frequently read multiple user's profiles.. there are two options I see at present:

Option 1: - store separate hash key per user's profile

Option 2: - use single hash key to store all users profile

Please tell me which option is best considering following:

  1. performance
  2. memory utilization
  3. read multiple user's profile - for batch processing I should able to read 1-100, 101-200 user's profile at time
  4. larger dataset - what if there are millions users profile

Upvotes: 5

Views: 3256

Answers (3)

Imaskar
Imaskar

Reputation: 2939

PROS for option 1

(But don't use hash, use single key. Like SET profile:4d094f58c96767d7a0099d49 {...})

  • Iterating keys is slightly faster than iterating hash. (That's also why you should modify option 1 to use SET, not HSET)
  • Retrieving key value is slightly faster than retrieving hash field

PROS for option 2

  • You can get all users in a single call with HMGET, but only if your user base is not very big. Otherwise it can be too hard for server to serve you the result.
  • You can flush all users in a single command. Useful if you have backing DB.

PROS for option 3

Option 3 is to break your user data in hash buckets determined by hash from user id. Works good if you have many users and do batches often. Like this:

HSET profiles:<bucket> <id> {json object}
HGET profiles:<bucket> <id>
HMGET profiles:<bucket> 

The last one to get a whole bucket of profiles. Don't recommend it to be more than 1mb in total. Works good with sequential ids, not so good with hashes, because they can grow too much. If you used it with hashes and it grew too much that this slows your Redis, you can fallback to HSCAN (like in option2) or redistribute objects to more buckets with new hash function.

  • Faster batch load
  • Slightly slower single object store/load

My recommendation, if I got your situation right, is to use 3rd option with sequential ids of range 100. And if you aiming at hight amounts of data, plan for cluster from day one.

Upvotes: 0

Adarsh
Adarsh

Reputation: 3573

As Sergio Tulentsev pointed out, its not good to store all the user's data (especially if the dataset is huge) inside one single hash by any means.

Storing the users data as individual keys is also not preferred if your looking for memory optimization as pointed out in this blog post

Reading the user's data using pagination mechanism demands one to use a database rather than a simple caching system like redis. Hence it's recommended to use a NoSQL database such as mongoDB for this.

But reading from the database each time is a costly operation especially if you're reading a lot of records.

Hence the best solution would be to cache the most active user's data in redis to eliminate the database fetch overhead.

I recommend you looking into walrus .

It basically follows the following pattern:

@cache.cached(timeout=expiry_in_secs)
def function_name(param1, param2, ...., param_n):
    # perform database fetch
    # return user data

This ensures that the frequently accessed or requested user data is in redis and the function automatically returns the value from redis than making the database call. Also the key is expired if not accessed for a long time.

You set it up as follows:

from walrus import *
db = Database(host='localhost', port=6379, db=0)

where host can take the domain name of the redis cluster running remotely.

Hope this helps.

Upvotes: 3

r.pedrosa
r.pedrosa

Reputation: 749

Option #1.

  • Performance: Typically it depends on your use case but let say that you want to read a specific user (on the login/logout, authorization purposes, etc). With option #1, you simply compute the user hash and get the user profile. With option #2, you will need to get all users profiles and parse the json (although you can make it efficient it would never be so efficient and simpler as option #1);

  • Memory utilization: You can make option #1 and option #2 take the same size in redis (on option #1, you can avoid storing the hash/user id as part of the json). However, and picking the same example to load a specific user, you just need to in code/memory a single user profile json instead of a bigger json with a set of user profiles

  • read multiple user's profile - for batch processing I should able to read 1-100, 101-200 user's profile at time: For this, as typically is done with a relational database, you want to do paging. There are different ways of doing paging with redis but using a scan operation is an easy way to iterate over a set of users

  • larger dataset - what if there are millions users profile:

Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory

If you "can't have a dataset larger the memory", you can look to Partitioning as the Redis FAQ suggests. On the Redis FAQ you can also check other metrics such as the "maximum number of keys a single Redis instance can hold" or "Redis memory footprint"

Upvotes: 2

Related Questions