Find duplicates in a list of tuples

Question

You are given information about users of your website. The information includes username, a phone number and/or an email. Write a program that takes in a list of tuples where each tuple represents information for a particular user and returns a list of lists where each sublist contains the indices of tuples containing information about the same person. For example:

Input:
[("MLGuy42", "andrew@example.com", "123-4567"),
("CS229DungeonMaster", "123-4567", "ml@example.net"),
("Doomguy", "john@example.org", "carmack@example.com"),
("andrew26", "andrew@example.com", "mlguy@example.com")]

Output:
[[0, 1, 3], [2]]

Since "MLGuy42", "CS229DungeonMaster" and "andrew26" are all the same person.

Each sublist in the output should be sorted and the outer list should be sorted by the first element in the sublist.

Below is the code snippet that I did for this problem. It seems to work fine, but I'm wondering if there is a better/optimized solution. Any help would be appreciated. Thanks!

def find_duplicates(user_info):
    results = list()
    seen = dict()
    for i, user in enumerate(user_info):
        first_seen = True
        key_info = None
        for info in user:
            if info in seen:
                first_seen = False
                key_info = info
                break
        if first_seen:
            results.append([i])
            pos = len(results) - 1
        else:
            index = seen[key_info]
            results[index].append(i)
            pos = index
        for info in user:
            seen[info] = pos
    return results

jihn04 · Accepted Answer

I think I've reached to an optimized working solution using a graph. Basically, I've created a graph with each node contains its user information and its index. Then, use dfs to traverse the graph and find the duplicates.

Find duplicates in a list of tuples

Answers (2)

Related Questions