ChrisJ
ChrisJ

Reputation: 477

Would like to prevent dupes in a python list of lists

I am creating a list of lists and want to prevent dupes. For example, I have:

mainlist = [[a,b],[c,d],[a,d]]

the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].

UPDATE

mainlist  = [[a,b],[c,d],[a,d]]   
swap = [b,a]   

for item in mainlist:
   if set(item) & set(swap):
     print "match was found", item
   else:
     mainlist.append(swap)

Any suggestions as to how I can test whether the next item to be added is already in the list?

Upvotes: 0

Views: 72

Answers (2)

cs95
cs95

Reputation: 402483

Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.

def add_to_mainlist(new_list):
    if frozenset(new_list) not in dups:
        mainlist.append(new_list)

mainlist = [['a', 'b'],['c', 'd'],['a', 'd']] 

dups = set()

for l in mainlist:
    dups.add(frozenset(l))

print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)

This outputs:

Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]

Showing that the new list was indeed not added to the original.

Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:

def add_to_mainlist(mainlist, new_list):
    dups = set()
    for l in mainlist:
        dups.add(frozenset(l))

    if frozenset(new_list) not in dups:
        mainlist.append(new_list)

    return mainlist

mainlist = [['a', 'b'],['c', 'd'],['a', 'd']] 

print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)

Why doesn't your existing code work?

This is what you're doing:

...
for item in mainlist:
   if set(item) & set(swap):
     print "match was found", item
   else:
     mainlist.append(swap)

You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.

Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:

dups = False 
for item in mainlist:
    if len(set(item) & set(swap)) == 2:
        dups = True
        break
   
if dups == False:
    mainlist.append(swap)

You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.

Upvotes: 3

Chris
Chris

Reputation: 22953

If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:

>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]   
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>> 
>>> def add_to_list(lst, sublist):
...     if frozenset(sublist) not in lst:
...         lst.append(frozenset(sublist))
... 
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> 

If the order does matter you can either do what @Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:

>>> def add_to_list(lst, sublist):
...     for l in lst:
...         if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
...             return
...     lst.append(sublist)
... 
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>

Upvotes: 1

Related Questions