Reputation: 16733
I have two list,
l1 = [1,2,3,4,5,6]
l2 = [3,2]
what i want is to remove the element of list l1 which is in l2, for that i have done something like this,
for x in l1:
if x in l2:
l1.remove(x)
it gives output like
[1, 3, 4, 5, 6]
but the output should be like
[1, 4, 5, 6]
can any one shed light on this.
Upvotes: 4
Views: 15610
Reputation: 1886
l1 = [1, 2, 3, 4, 5, 6]
l2 = [3, 2]
[l1.remove(x) for x in l2]
print l1
[1, 4, 5, 6]
Upvotes: 0
Reputation: 123423
FWIW I get significantly different results than @Tim Pietzcker did using what I believe is more realistic input data set and by using a little more rigorous (but otherwise the same) approach to timing different people's answers.
The names and code snippets are the same as Tim's except I added a variation of the one named Lattyware
called Lattyware_rev
which determines what elements to keep rather than reject -- it turned out to be a slower than the former. Note that the two fastest don't preserve the order of l1
.
Here's the latest timing code:
import timeit
setup = """
import random
random.seed(42) # initialize to constant to get same test values
l1 = [random.randrange(100) for _ in xrange(100)]
l2 = [random.randrange(100) for _ in xrange(10)]
"""
stmts = {
"Minion91": """
for x in reversed(l1):
if x in l2:
l1.remove(x)
""",
"mgilson": """
for x in l1[:]: # correction
if x in l2:
l1.remove(x)
""",
"mgilson_set": """
l1 = list(set(l1).difference(l2))
""",
"Lattyware": """
removals = set(l2)
l1 = [item for item in l1 if item not in removals]
""",
"Lattyware_rev": """
keep = set(l1).difference(l2)
l1 = [item for item in l1 if item in keep]
""",
"Latty_mgilson": """
removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",
"petr": """
for item in l2:
while item in l1:
l1.remove(item)
""",
"petr (handles dups)": """
l1 = filter(lambda x: x not in l2, l1)
""",
"millimoose": """
for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass
""",
"K.-Michael Aye": """
l1 = list(set(l1) - set(l2))
""",
}
N = 10000
R = 3
timings = [(idea,
min(timeit.repeat(stmts[idea], setup=setup, repeat=R, number=N)),
) for idea in stmts]
longest = max(len(t[0]) for t in timings) # length of longest name
exec(setup) # get an l1 & l2 just for heading length measurements
print('fastest to slowest timings of ideas:\n' +\
' ({:,d} timeit calls, best of {:d} executions)\n'.format(N, R)+\
' len(l1): {:,d}, len(l2): {:,d})\n'.format(len(l1), len(l2)))
for i in sorted(timings, key=lambda x: x[1]): # sort by speed (fastest first)
print "{:>{width}}: {}".format(*i, width=longest)
Output:
fastest to slowest timings of ideas:
(10,000 timeit calls, best of 3 executions)
len(l1): 100, len(l2): 10)
mgilson_set: 0.143126456832
K.-Michael Aye: 0.213544010551
Lattyware: 0.23666971551
Lattyware_rev: 0.466918513924
Latty_mgilson: 0.547516608553
petr: 0.552547776807
mgilson: 0.614238139366
Minion91: 0.728920176815
millimoose: 0.883061820848
petr (handles dups): 0.984093136969
Of course, please let me know if there's something radically wrong that would explain the radically different results.
Upvotes: 0
Reputation: 5605
If the order and loss of duplicates in l1 do not matter:
list(set(l1) - set(l2))
The last list() is only required if you need the result as a list. You could also just use the resulting set, it's also iterable. If you need it ordered you can of course call l.sort() on the resulting list.
Upvotes: 2
Reputation: 39950
You're modifying the list l1
while you're iterating over it, this will cause weird behaviour. (The 3
will get skipped during iteration.)
Either iterate over a copy, or change your algorithm to iterate over l2
instead:
for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass
(This should perform better than testing Nope, this performs terribly as if x in l1
explicitly.)l1
grows in size.
Upvotes: 0
Reputation: 336118
Edit: Removed my original answer because even though it did give correct results, it did so for non-intuitive reasons, and is was not very fast either... so I've just left the timings:
import timeit
setup = """l1 = list(range(20)) + list(range(20))
l2 = [2, 3]"""
stmts = {
"mgilson": """for x in l1[:]:
if x in l2:
l1.remove(x)""",
"petr": """for item in l2:
while item in l1:
l1.remove(item)""",
"Lattyware": """removals = set(l2)
l1 = [item for item in l1 if item not in removals]""",
"millimoose": """for x in l2:
try:
while True: l1.remove(x)
except ValueError: pass""",
"Latty_mgilson": """removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",
"mgilson_set": """l1 = list(set(l1).difference(l2))"""
}
for idea in stmts:
print("{0}: {1}".format(idea, timeit.timeit(setup=setup, stmt=stmts[idea])))
Results (Python 3.3.0 64bit, Win7):
mgilson_set: 2.5841989922197333
mgilson: 3.7747968857414813
petr: 1.9669433777815701
Latty_mgilson: 7.262900152285258
millimoose: 3.1890831105541793
Lattyware: 4.573971325181478
Upvotes: 1
Reputation: 2586
Why not making it a bit simpler? No need to actually iterate over l1
if we only want to remove elements present in l2
:
for item in l2:
while item in l1:
l1.remove(item)
This gives you exactly the output desired...
Also, as commenters point out, if there is a possibility that we can have duplicates:
l1 = filter(lambda x: x not in l2, l1)
.. or many other variations using list comprehensions.
Upvotes: 7
Reputation: 309841
You want the outer loop to read:
for x in l1[:]:
...
You can't change a list while iterating over it and expect reasonable results. The above trick makes a copy of l1
and iterates over the copy instead.
Note that if order doesn't matter in the output list, and your elements are unique and hashable, you could use a set:
set(l1).difference(l2)
which will give you a set as output, but you can construct a list from it easily:
l1 = list(set(l1).difference(l2))
Upvotes: 3
Reputation: 1929
This is easily explained like this.
consider the first array you have:
| 1 | 2 | 3 | 4 | 5 | 6 |
Now you start iterating
| 1 | 2 | 3 | 4 | 5 | 6 |
^
Nothing happens, iterator increments
| 1 | 2 | 3 | 4 | 5 | 6 |
^
2 gets removed
| 1 | 3 | 4 | 5 | 6 |
^
iterator increments
| 1 | 3 | 4 | 5 | 6 |
^
And voila, 3 is still there.
The solution is to iterate ove a copy of the vector e.g.
for x in l1[:]: <- slice on entire array
if x in l2:
l1.remove(x)
or to iterate in reverse:
for x in reversed(l1):
if x in l2:
l1.remove(x)
Which acts like this:
| 1 | 2 | 3 | 4 | 5 | 6 |
^
| 1 | 2 | 3 | 4 | 5 | 6 |
^
| 1 | 2 | 4 | 5 | 6 |
^
| 1 | 2 | 4 | 5 | 6 |
^
| 1 | 4 | 5 | 6 |
^
| 1 | 4 | 5 | 6 |
^
Upvotes: 9
Reputation: 88977
As others have said, you can't edit a list while you loop over it. A good option here is to use a list comprehension to create a new list.
removals = set(l2)
l1 = [item for item in l1 if item not in removals]
We make a set as a membership check on a set is significantly faster than on a list.
Upvotes: 2