Effectively search if part of a tuple exist in a list of tuples

Question

I have a tuple list which contains tuples of 6 digits, ranging from 01 to 99. For example:

tuple_list = {(01,02,03,04,05,06), (20,22,24,26,28,30), (02,03,04,05,06,99)}

For every tuple on this list I need to effectively search if there are any other tuples that have at least 5 numbers in common with it (excluding the searched number). So for the above example, the result will be:

(01,02,03,04,05,06) -> (02,03,04,05,06,99)
(20,22,24,26,28,30) -> []
(02,03,04,05,06,99) -> (01,02,03,04,05,06)

The list itself is big and can hold up to 1,000,000 records.
I tried the naive approach of scanning the list one-by-one, but this has an O(n^2) complexity and takes a lot of time.
I thought about maybe using a dict but I can't find a way to search for part of a key (it would have worked fine if I needed to search for the exact key). Maybe some sort of a suffix/prefix tree variation is needed, but I can't seem to figure it out.

Any help will be appreciated.

Aziz Sonawalla · Accepted Answer

The code below generates a dict where they key is a 5-tuple and the value is a list of all the tuples that have those 5 elements.

It runs in O(nm) where n is the size of the tuple list and m is the size of each tuple. For 6-tuples, it runs in O(6n). See test results below

def getCombos(tup):
    """
    Produces all combinations of the tuple with 1 missing
    element from the original
    """
    combos = []
    # sort the input tuple here if it's not already sorted
    for i in range(0, len(tup)):
        tupAsList = list(tup)
        del tupAsList[i]
        combos.append(tupAsList)
    return combos
    
def getKey(combo):
    """
    Creates a string key for a given combination
    """
    strCombo = [str(i) for i in combo]
    return ",".join(strCombo)

def findMatches(tuple_list):
    """
    Returns dict of tuples that match
    """
    matches = {}

    for tup in tuple_list:
        combos = getCombos(tup)
        for combo in combos:
            key = getKey(combo)
            if key in matches:
                matches[key].append(tup)
            else:
                matches[key] = [tup]
                
    # filter out matches with less than 2 elements (optional)
    matches = {k: v for k, v in matches.items() if len(v) > 1}

    return matches
    
    
tuple_list = [(01,02,03,04,05,06), (20,22,24,26,28,30), (02,03,04,05,06,99)]

print(findMatches(tuple_list)) # output: {'2,3,4,5,6': [(1, 2, 3, 4, 5, 6), (2, 3, 4, 5, 6, 99)]}

I tested this code against the brute force solution. For 1000 tuples, the brute force version took 5.5s whereas this solution took 0.03s. See repl here

You can rearrange the output by iterating through the values but that may be unnecessary depending on how you're using it

Effectively search if part of a tuple exist in a list of tuples

Answers (2)

Related Questions