Bulkilol
Bulkilol

Reputation: 161

Removing list duplicates given indices symmetry in python

In python, given a list mylist of lists el of integers, I would like to remove duplicates that are equivalent under specific permutations of the indices.

The question is more general but I have in mind "decorated" McKay graphs where each node is given an integer defining el whose sum is equal to a certain number. Generating mylist is easy enough, but there is a lot of redundancy depending on the symmetry of the graph. For instance, for the "D4" diagram, el represents the following decorated graph:

                                       a
                                       |
    el = [a, b, c, d, e]     <-->    b-c―d
                                       |
                                       e

The two lists [1,0,4,0,0] and [0,0,4,0,1] are duplicates because there are the same up to a reflection of the graph around the horizontal axis. More generally, the lists el duplicates if there are equivalent under any permutation of a,b,d,e. Is there a simple way of removing duplicates given an arbitrary symmetry?

The way I did it for this particular graph was to sort the indices 0,1,3,4 and compare:

from itertools import product

def check_duplicate(f, li):

    for l in li:
        if l[2] == f[2] and sorted(l[:2] + l[3:]) == sorted(f[:2] + f[3:]):
            return True

    return False

mylist = [list(i) for i in product(range(5), repeat=5) if sum(i) == 5]

newlist = []
for f in my_list:
    if check_duplicate(f, newlist) == False:
        newlist.append(f)

Here tmp is generated brute force, but the condition is more involved in my real case.

This works well enough for this particular example, but is a bit clunky, and the implementation is harder to generalize to more involved cases. Is there a way to remove the duplicates in a more optimized way, in particular one that can easily implement removing duplicates given a particular symmetry of the indices of el?

Upvotes: 1

Views: 40

Answers (1)

Jay Livingston
Jay Livingston

Reputation: 101

It will probably be better to add "seen" cases to a set, and then continue to check if new elements have already been "seen", rather than comparing every element to every other element in the list iteratively (an operation with O(n^2) time complexity). Hopefully I understand what you're trying to accomplish, but here's how I'd do it:

from itertools import product

def to_key(f):
    # Creates a hashable key, keeping order-insensitive values sorted, and noting the middle value"""
    return (f[2], tuple(sorted(f[:2] + f[3:])))

mylist = [list(i) for i in product(range(5), repeat=5) if sum(i) == 5]

# Init an empty set. We'll add to it if a key isn't there
seen = set()
# If we add to the seen set, we'll add the element to newList
newlist = []

for f in mylist:
    key = to_key(f) # Create the hashable key
    if key not in seen: # Check if key in seen
        seen.add(key)
        newlist.append(f)

Upvotes: 0

Related Questions