Reputation: 3523
Because of the poor performance with using python dict for huge data, i manage to migrate to redis. So i have the following :
"doc1" => ('989', 4.0), ('99', 4.0), ('990', 4.0), ('991', 4.0), ('992', 4.0), ('993', 4.0), ('994', 4.0), ('995', 4.0), ('996', 4.0), ('997', 4.0), ('998', 4.0), ('999', 4.0)
"doc2" => ('4', 4.0), ('21', 4.0), ('55', 4.0), ('991', 4.0), ('992', 4.0), ('993', 4.0), ('994', 4.0), ('995', 4.0), ('996', 4.0)
"result" => ('991', 8.0), ('992', 8.0), ('993', 8.0), ('994', 8.0), ('995', 8.0), ('996', 8.0), ('99', 4.0),('4', 4.0), ('21', 4.0), ('55', 4.0)
So as you can see, i want to combine the two redis lists into one by using python in a way that if there are elements in doc1 that exists in doc2 sum their values, if the elements in doc1 doesn't exist in doc2 add the elements to results. My previous implementation using dict was:
result_array = {k: [db_array.get(k, result_array.get(k))[0],db_array.get(k, dv)[1] + result_array.get(k, dv)[1]] for k in set(db_array) | set(result_array)}
how to keep the structure of the dictionary
As you can see this solution is for:
{'991': [4.0, 's.text'], '21': [4.0, 't.text'], '990': [4.0, 'b.text']}
but redis doesn't support list in list so i have to find a different solution.
Upvotes: 2
Views: 1735
Reputation: 39223
If the values are unique, you could use redis sorted set:
zadd doc1 4.0 989
zadd doc1 4.0 991
zadd doc2 4.0 21
zadd doc2 4.0 991
zinterstore result 2 doc1 doc2
zrange result 0 -1 withscores
1) "991"
2) "8"
This will give you the intersection of the sets (the elements which exists in both sets) with the score a sum of the element's score in each set.
To get the elements which exist in doc1
but not doc2
is trickier, since there is no zdiff
in redis. Depending on your data (and what scores are for elements present in both sets), you might do this (supposing all scores (what you call "values") are positive, and scores for mutual elements are the same in both sets):
zunionstore only_in_doc1 2 doc1 doc2 weights 1 -1
zremrangebyscore only_in_doc1 -inf 0
zrange only_in_doc1 0 -1 withscores
1) "989"
2) "4"
Upvotes: 2