Reputation: 898
I have sets of the ids for individuals keyed by each state
people/stateName:VA = {1,2,3,4,5,6}
people/stateName:TX = {7,8,9,10,11}
...
and I have sets of the ids for individuals who are part of company 1
people/company:1 = {2,6,7,10}
In the above example, if I wanted to find all the people who belong to company 1 and who live the state of VA and TX, I would do:
SUNIONSTORE tempkey people/stateName:VA people/stateName:TX
SINTERSTORE tempkey tempkey people/company:1
In math: (A ∪ B) ∩ C
However, in my case, the number of states is not known, so you would have to iterate over a list of the sets of the states you want, combine them, and then finally intersect that by a company (you'd have to repeat the process if you have more than one company)
foreach( state in state_list ){
SUNIONSTORE(tempkey_state,tempkey_state, 'people/stateName:{state}')
}
foreach( companyNumber in company_list ){
SUNIONSTORE(tempkey_company, tempkey_company, 'people/company:{companyName}')
}
SINTERSTORE(resultkey, tempkey_state, tempkey_company);
In my real scenario, each set is very large, in the 10,000-1,000,000 of members. This process however can be slow (Slower than SQL In some cases)
From my understanding, the bottleneck is SUNIONSTORE
, since it grows with each iteration and it has a big O of O(N)
Are there any ways I can do what I want faster? Some solutions I have in mind
What are your thoughts?
Upvotes: 0
Views: 224
Reputation: 73226
The algebra of sets includes commutative and distributive laws, so that:
(A ∪ B) ∩ C = (C ∩ A) ∪ (C ∩ B)
Redis uses the commutative law to optimize intersection calculation: it sorts the sets per size before applying its algorithm, to minimize the number of operations.
Furthermore, the performance of union and intersection operations is dominated by the cost of object creations (involving memory allocations), rather than the actual union/intersection algorithms.
In your example, I would say the probability of having large sets for states is higher than the probability of having large sets for companies, so I would rather execute:
MULTI
SINTERSTORE tmp1 people/company:1 people/stateName:VA
SINTERSTORE tmp2 people/company:1 people/stateName:TX
SUNION tmp1 tmp2
DEL tmp1 tmp2
EXEC
Here, the only objects actually created in the Redis namespace are already the result of an intersection which likely will produce less objects. Note that the last union does not store the result, but rather return it directly to the client.
Be sure to use a pipeline to also minimize the number of network roundtrips.
If you have several companies, you can apply an union on their sets before (if the average size of company sets is not too high), or you can repeat this pattern several times once per company (if the companies are too large).
Upvotes: 2