Najiva
Najiva

Reputation: 162

Searching for a matching items in between two sets with "complex in" condition

Lets say I have two sets:

set1 = set(("a", "b", 5), ("a", "c", 3))
set2 = set(("a", "b", 3), ("a", "b", 7))

Each item has 3 values:

  1. first index (first part of the key) e.g. "a"
  2. second index (second part of the key) e.g. "b"
  3. third index (timestamp last part of the key) e.g 5

I want to search for all of the elements of set 1, that have the corresponding item in set2, where the corresponding item is any item that has the same value at both the first and second index, but the value of the timestamp of the matching item from the set2 must be lower than the timestamp of the item from the set1.

From the example the matching item would be:

  1. ("a", "b", 5) ("a", "b", 3) timestamp of 3 < 5
  2. ✓ ("a", "b", 5) ("a", "b", 7)
  3. ("a", "c", 3) ("a", "b", 3) "c" != "b"
  4. ("a", "c", 3) ("a", "b", 7) "c" != "b"

Now the problem is I need to search for the items in set2 in n.log(n) time. That is why I am trying to keep the values in the set. If I was looking for the items without the timestamp:

set1 = set(("a", "b"), ("a", "c"))
set2 = set(("a", "b"), ("a", "b"))

I could do:

for item from set1:
  if item in set2:
    print("Found matching item")

But with the timestamp, I can't figure out how to implement this without two for loops.

Upvotes: 1

Views: 539

Answers (2)

Jay
Jay

Reputation: 2471

Here's an O(n) solution with O(n) additional space to store the dict2 -

set1 = {("a", "b", 5), ("a", "c", 3)}
set2 = {("a", "b", 3), ("a", "b", 7)}

dict2 = {}
for item in set2:
    dict2[item[:2]] = max(dict2.get(item[:2], float('-inf')), item[2])

for item in set1:
    if dict2.get(item[0:2], float('-inf')) > item[2]:
        print("Found match")
        print("Match -", item, item[0:2] + (dict2.get(item[0:2]),))

outputs -

Found match
Match - ('a', 'b', 5) ('a', 'b', 7)

We create a new dict2 which stores the two keys and the max timestamp seen in the set2, and then we loop through items in set1 to see if for the same key if the max timestamp is larger than the timestamp in set1 for the same key.

Upvotes: 1

Barmar
Barmar

Reputation: 781380

You can create a dictionary from set2 where the key is the first two parts of the tuple, and the value is a list of all the timestamps. Then you can find the matching key and test if the timestamp in set1 is lower than any of them.

d = {}
for item in set2:
    d.setdefault(item[0:2], []).append(item[2])

for item in set1:
    if any(item[2] > x for x in set2.get(item[0:2], [])):
        print("Found match")

Upvotes: 1

Related Questions