badc0re
badc0re

Reputation: 3523

merge and sum redis ordered sets

Because of the poor performance with using python dict for huge data, i manage to migrate to redis. So i have the following :

"doc1" =>  ('989', 4.0), ('99', 4.0), ('990', 4.0), ('991', 4.0), ('992', 4.0), ('993', 4.0), ('994', 4.0), ('995', 4.0), ('996', 4.0), ('997', 4.0), ('998', 4.0), ('999', 4.0)

"doc2" =>  ('4', 4.0), ('21', 4.0), ('55', 4.0), ('991', 4.0), ('992', 4.0), ('993', 4.0), ('994', 4.0), ('995', 4.0), ('996', 4.0)

"result" => ('991', 8.0), ('992', 8.0), ('993', 8.0), ('994', 8.0), ('995', 8.0), ('996', 8.0), ('99', 4.0),('4', 4.0), ('21', 4.0), ('55', 4.0)

So as you can see, i want to combine the two redis lists into one by using python in a way that if there are elements in doc1 that exists in doc2 sum their values, if the elements in doc1 doesn't exist in doc2 add the elements to results. My previous implementation using dict was:

result_array = {k: [db_array.get(k, result_array.get(k))[0],db_array.get(k, dv)[1] + result_array.get(k, dv)[1]] for k in set(db_array) | set(result_array)}

how to keep the structure of the dictionary

As you can see this solution is for:

{'991': [4.0, 's.text'], '21': [4.0, 't.text'], '990': [4.0, 'b.text']}

but redis doesn't support list in list so i have to find a different solution.

Upvotes: 2

Views: 1735

Answers (1)

Linus Thiel
Linus Thiel

Reputation: 39223

If the values are unique, you could use redis sorted set:

zadd doc1 4.0 989
zadd doc1 4.0 991

zadd doc2 4.0 21
zadd doc2 4.0 991

zinterstore result 2 doc1 doc2
zrange result 0 -1 withscores
1) "991"
2) "8"

This will give you the intersection of the sets (the elements which exists in both sets) with the score a sum of the element's score in each set.

To get the elements which exist in doc1 but not doc2 is trickier, since there is no zdiff in redis. Depending on your data (and what scores are for elements present in both sets), you might do this (supposing all scores (what you call "values") are positive, and scores for mutual elements are the same in both sets):

zunionstore only_in_doc1 2 doc1 doc2 weights 1 -1
zremrangebyscore only_in_doc1 -inf 0
zrange only_in_doc1 0 -1 withscores
1) "989"
2) "4"

Upvotes: 2

Related Questions