Reputation: 110570
I am trying to remove duplicates from 2 lists. so I wrote this function:
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
for i in b:
if i in a:
print "found " + i
b.remove(i)
print b
But I find that the matching items following a matched item does not get remove.
I get result like this:
found ijk
found opq
['lmn', 'rst', '123', '456']
but i expect result like this:
['123', '456']
How can I fix my function to do what I want?
Thank you.
Upvotes: 18
Views: 69987
Reputation: 11
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456","abc"]
for i in a:
if i in b:
print("found", i)
b.remove(i)
print(b)
output:
found abc
found ijk
found lmn
found opq
found rst
['123', '456']
Upvotes: 0
Reputation: 39
You can use the list comprehensive
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
duplicates value removed from a
c=[value for value in a if value not in b]
duplicate value removed from b
c=[value for value in b if value not in a]
Upvotes: 0
Reputation: 31
You can use lambda functions.
f = lambda list1, list2: list(filter(lambda element: element not in list2, list1))
The duplicated elements in list2 are removed from list1.
>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456"]
>>> f(a, b)
['abc', 'def', 'xyz']
>>> f(b, a)
['123', '456']
Upvotes: 3
Reputation: 382
Along the lines of 7stud, if you go through the list in reversed order, you don't have the problem you encountered:
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
for i in reversed(b):
if i in a:
print "found " + i
b.remove(i)
print b
Output:
found rst
found opq
found lmn
found ijk
['123', '456']
Upvotes: 0
Reputation: 48599
Here is what's going on. Suppose you have this list:
['a', 'b', 'c', 'd']
and you are looping over every element in the list. Suppose you are currently at index position 1:
['a', 'b', 'c', 'd']
^
|
index = 1
...and you remove the element at index position 1, giving you this:
['a', 'c', 'd']
^
|
index 1
After removing the item, the other items slide to the left, giving you this:
['a', 'c', 'd']
^
|
index 1
Then when the loop runs again, the loop increments the index to 2, giving you this:
['a', 'c', 'd']
^
|
index = 2
See how you skipped over 'c'? The lesson is: never delete an element from a list that you are looping over.
Upvotes: 38
Reputation: 7799
What about
b= set(b) - set(a)
If you need possible repetitions in b
to also appear repeated in the result and/or order to be preserved, then
b= [ x for x in b if not x in a ]
would do.
Upvotes: 28
Reputation: 19805
There are already many answers on "how can you fix it?", so this is a "how can you improve it and be more pythonic?": since what you want to achieve is to get the difference between list b
and list a
, you should use difference operation on sets (operations on sets):
>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
>>> s1 = set(a)
>>> s2 = set(b)
>>> s2 - s1
set(['123', '456'])
Upvotes: 0
Reputation: 1025
One way of avoiding the problem of editing a list while you iterate over it, is to use comprehensions:
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
b = [x for x in b if not x in a]
Upvotes: 2
Reputation: 8437
You asked to remove both the lists duplicates, here's my solution:
from collections import OrderedDict
a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
x = OrderedDict.fromkeys(a)
y = OrderedDict.fromkeys(b)
for k in x:
if k in y:
x.pop(k)
y.pop(k)
print x.keys()
print y.keys()
Result:
['abc', 'def', 'xyz']
['123', '456']
The nice thing here is that you keep the order of both lists items
Upvotes: 6
Reputation: 113978
or a set
set(b).difference(a)
be forewarned sets will not preserve order if that is important
Upvotes: 3
Reputation: 34493
Your problem seems to be that you're changing the list you're iterating over. Iterate over a copy of the list instead.
for i in b[:]:
if i in a:
b.remove(i)
>>> b
['123', '456']
But, How about using a list comprehension instead?
>>> a = ["abc", "def", "ijk", "lmn", "opq", "rst", "xyz"]
>>> b = ["ijk", "lmn", "opq", "rst", "123", "456", ]
>>> [elem for elem in b if elem not in a ]
['123', '456']
Upvotes: 38