blitter
blitter

Reputation: 369

How to get the elements of a Redis set that do NOT match a global key?

How do I get all keys that do NOT have a corresponding global hash?

I store my main data in hashes with unique keys like so:

person_1: name: Anna, gender: female
person_2: name: Mark, gender: male
person_3: name: Pat, gender: female
person_4: name: Jo, gender: female
person_5: name: Robert, gender: male

Let's say I maintain an index of all male persons as a SET, like so:

male_persons: 2, 5, 6

As you can see, I snuck in an "error", as there is no person_6. Doing a SORT, I can easily build a "relational query" that gives me all males:

SORT "male_persons" BY nosort GET # GET "person_*->name"

Expected output is something like this:

"2", "Mark",
"5", "Robert",
"6", nil

My question is, how can I get in 1 or 2 queries to only the SET keys that have NO corresponding person, i.e. how do I produce only this line in the above example:

"6", nil

or even better:

"6"

or:

person_6

In SQL, this would be something like:

SELECT id FROM male_person_index WHERE male_person_index.id NOT IN (SELECT person.id FROM person);

I know one way of doing this would be to write a Lua script, but this seems so basic that it should be possible to do this in 1 or 2 standard queries, and I am just missing it. In this particular case performance is not super essential, so 1 or 2 or even 3 standard queries issued one after another should suffice.

Update: I wrote a solution as a Lua script, but as stated before, I am not happy with it:

eval 'local out = {}; local i = 1; local exists; for _, key in ipairs(redis.call("SMEMBERS", KEYS[1])) do exists = redis.call("EXISTS", "persons_" .. key); if exists == 0 then out[i] = key .. " -> " .. exists; i = i + 1; end; end; return out;' 1 "male_persons"

Reason I am unhappy is that this is "loopy" and slow and calls EXISTS potentially thousands or millions of times. I still believe this should be done in 2 or 3 transactions, not with 1000s or millions.

An elegant solution would take the output array of the SORT command, STORE it, and then somehow subtract from that result another query result array, in one swoop, to reduces the final output by the keys that are NOT orphaned, giving me, in the end, the orphan keys only.

Is that possible?

Upvotes: 1

Views: 1019

Answers (1)

for_stack
for_stack

Reputation: 22926

You can try the following really really tricky way:

Solutoin

  1. Use your sort solution and store the result into a list: list1.

    SORT "male_persons" BY nosort GET # GET "person_*->name" store list1
    
  2. Sort male_persons by person_*->name in lexicographically order (i.e. ALPHA option), and store the result into a list: list2:

    SORT "male_persons" BY person_*->name ALPHA GET # GET "person_*->name" store list2
    
  3. Remove all empty strings in list1, and get the number of elements removed: num:

    num = LREM list1 0 ""
    
  4. Get the first num * 2 elements from the list2:

    LRANGE list2 0 (num * 2 - 1)
    

Explain:

  1. With the data sorted in lexicographically order, i.e. step 2, we can ensure that the non-existent members are in the front of the result list, i.e. list2, and the corresponding value is an empty string:
127.0.0.1:6379> lrange list2 0 -1
1) "6"
2) ""
3) "2"
4) "Mark"
5) "5"
6) "Robert"
  1. We can remove all empty strings from list1, i.e. step 1 and step 3, so that we can get the number of members that doesn't exist in hash.

NOTE: we DO NOT sort in step 1 to save some cost.

127.0.0.1:6379> lrem list1 0 ""
(integer) 1
  1. Then the first num * 2 elements in list2 is the result, i.e. step 4:
127.0.0.1:6379> lrange list2 0 1
1) "6"
2) ""

Upvotes: 1

Related Questions