bitoiu
bitoiu

Reputation: 7474

Redis: how to store a list of user hashes and retrieve it?

I've started using redis today and I've been through the tutorial and some links at stackoverflow but I'm failing to understand how to properly use redis for what it seems to be a very simple use case.

Goal: Save several users data into redis and read all of the users at once.

I start a redis client and I start by adding the first user which has id 1:

127.0.0.1:6379> hmset user:1 name "vitor" age 35
OK
127.0.0.1:6379> hgetall user:1 
1) "name"
2) "vitor"
3) "age"
4) "35"

I add a couple of more users, doing several command like this one:

127.0.0.1:6379> hmset user:2 name "nuno" age 10

I was (probably wrongly) expecting to be able to now query all my users by doing:

hgetall "user:"

or even

hgetall "user:*"

The fact that I've not seen anything like this in the tutorials, kind of tells me that I'm not using redis right for this use case.

Would you be able to tell me what should be the approach for this use case?

Upvotes: 3

Views: 5574

Answers (2)

Chhavi Gangwal
Chhavi Gangwal

Reputation: 1176

If you still want to use Redis you can use something like :

SADD users "{"userId":1,"name":John, "vitor":x,"age:35}"

SADD users "{"userId":2,"name":xt, "vitor":x,"age:43}" ...

And you can retrieve the same using :

SMEMBERS users

Upvotes: 3

Tw Bert
Tw Bert

Reputation: 3809

To understand why these kind of operations seem non-trivial in NoSQL implementations, it's good to think about why NoSQL exists (and has become very popular) at all.

When you look at an early NoSQL implementation like memcached, the first use case was very simple, but very important: a blazingly fast cache for distributed data, to cache for example web page data. Very quickly stuff like clustering and sharding was added, so not all data has to be available everywhere at once at every single node in the cluster, but can be gathered on demand.

NoSQL is very different from relational data storage. Don't overuse it. Consider relational databases as well, as they are sometimes far more suited for what you are trying to accomplish. In everything you design, ask yourself "Does this scale well?".

Okay, back to your question. It is in general bad practice to do wildcard searches. You prepare your data in a way that you can retrieve your data in a scalable way.

Redis is a very chique solution, allowing you to overcome a lot of NoSQL limitations in an elegant way.

If getting "a list of all users" isn't something you have to do very often, or doesn't need to scale well, is always "I really always want all users" because it's for a daily scan anyway, use HSCAN. SCAN operations with a proper batch size don't get in the way of other clients, you can just retrieve your records a couple of thousand at a time, and after a few calls you've got everything.

You can also store your users in a SET. There's no ordering in a set, so no pagination. It can help to keep your user names unique.

If you want to do things like "get me all users that start with the letter 'a'", I'd use a ZSET. I'd wait a week or two for ZRANGEBYLEX which is just about to be released, in the works as we speak. Or use an ORM like Josiah Carlsons's 'rom' package.

When you ask yourself "But now I have to do three calls instead of one when storing my data...?!": yup, that's how it works. If you need atomicity, use a Lua script, or MULTI+EXEC pipelining. Lua is generally easier.

You can also ask yourself if using a HSET is needed. Do you need to retrieve the individual data members? Each key or member has some overhead. On top of that, HGETALL has a Big-O specification of O(N), so it doesn't scale well. It might be better to serialize your row as a whole, using JSON or MsgPack, and store it in one HSET member, or just a simple GET/SET. Also read up on SORT.

Hope this helps, TW

Upvotes: 9

Related Questions