Atul Arvind
Atul Arvind

Reputation: 16733

Remove Python list element

I have two list,

l1 = [1,2,3,4,5,6]
l2 = [3,2]

what i want is to remove the element of list l1 which is in l2, for that i have done something like this,

for x in l1:
    if x in l2:
        l1.remove(x)

it gives output like

[1, 3, 4, 5, 6]

but the output should be like

[1, 4, 5, 6]

can any one shed light on this.

Upvotes: 4

Views: 15610

Answers (9)

Haresh Shyara
Haresh Shyara

Reputation: 1886

l1 = [1, 2, 3, 4, 5, 6]
l2 = [3, 2]
[l1.remove(x) for x in l2]
print l1
[1, 4, 5, 6]

Upvotes: 0

martineau
martineau

Reputation: 123423

FWIW I get significantly different results than @Tim Pietzcker did using what I believe is more realistic input data set and by using a little more rigorous (but otherwise the same) approach to timing different people's answers.

The names and code snippets are the same as Tim's except I added a variation of the one named Lattyware called Lattyware_rev which determines what elements to keep rather than reject -- it turned out to be a slower than the former. Note that the two fastest don't preserve the order of l1.

Here's the latest timing code:

import timeit

setup = """
import random
random.seed(42) # initialize to constant to get same test values
l1 = [random.randrange(100) for _ in xrange(100)]
l2 = [random.randrange(100) for _ in xrange(10)]
"""

stmts = {
"Minion91": """
for x in reversed(l1):
    if x in l2:
        l1.remove(x)
""",

"mgilson": """
for x in l1[:]: # correction
    if x in l2:
        l1.remove(x)
""",
"mgilson_set": """
l1 = list(set(l1).difference(l2))
""",

"Lattyware": """
removals = set(l2)
l1 = [item for item in l1 if item not in removals]
""",
"Lattyware_rev": """
keep = set(l1).difference(l2)
l1 = [item for item in l1 if item in keep]
""",
"Latty_mgilson": """
removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",

"petr": """
for item in l2:
    while item in l1:
        l1.remove(item)
""",
"petr (handles dups)": """
l1 = filter(lambda x: x not in l2, l1)
""",

"millimoose": """
for x in l2:
    try:
        while True: l1.remove(x)
    except ValueError: pass
""",

"K.-Michael Aye": """
l1 = list(set(l1) - set(l2))
""",

}

N = 10000
R = 3

timings = [(idea,
            min(timeit.repeat(stmts[idea], setup=setup, repeat=R, number=N)),
           ) for idea in stmts]
longest = max(len(t[0]) for t in timings)  # length of longest name
exec(setup)  # get an l1 & l2 just for heading length measurements
print('fastest to slowest timings of ideas:\n' +\
      '  ({:,d} timeit calls, best of {:d} executions)\n'.format(N, R)+\
      '   len(l1): {:,d}, len(l2): {:,d})\n'.format(len(l1), len(l2)))
for i in sorted(timings, key=lambda x: x[1]):  # sort by speed (fastest first)
    print "{:>{width}}:  {}".format(*i, width=longest)

Output:

fastest to slowest timings of ideas:
  (10,000 timeit calls, best of 3 executions)
   len(l1): 100, len(l2): 10)

        mgilson_set:  0.143126456832
     K.-Michael Aye:  0.213544010551
          Lattyware:  0.23666971551
      Lattyware_rev:  0.466918513924
      Latty_mgilson:  0.547516608553
               petr:  0.552547776807
            mgilson:  0.614238139366
           Minion91:  0.728920176815
         millimoose:  0.883061820848
petr (handles dups):  0.984093136969

Of course, please let me know if there's something radically wrong that would explain the radically different results.

Upvotes: 0

K.-Michael Aye
K.-Michael Aye

Reputation: 5605

If the order and loss of duplicates in l1 do not matter:

list(set(l1) - set(l2))

The last list() is only required if you need the result as a list. You could also just use the resulting set, it's also iterable. If you need it ordered you can of course call l.sort() on the resulting list.

Upvotes: 2

millimoose
millimoose

Reputation: 39950

You're modifying the list l1 while you're iterating over it, this will cause weird behaviour. (The 3 will get skipped during iteration.)

Either iterate over a copy, or change your algorithm to iterate over l2 instead:

for x in l2:
    try: 
        while True: l1.remove(x)
    except ValueError: pass

(This should perform better than testing if x in l1 explicitly.) Nope, this performs terribly as l1 grows in size.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336118

Edit: Removed my original answer because even though it did give correct results, it did so for non-intuitive reasons, and is was not very fast either... so I've just left the timings:

import timeit

setup = """l1 = list(range(20)) + list(range(20))
l2 = [2, 3]"""

stmts = {
"mgilson": """for x in l1[:]:
    if x in l2:
        l1.remove(x)""",
"petr": """for item in l2:
    while item in l1:
        l1.remove(item)""",
"Lattyware": """removals = set(l2)
l1 = [item for item in l1 if item not in removals]""",
"millimoose": """for x in l2:
    try: 
        while True: l1.remove(x)
    except ValueError: pass""",
"Latty_mgilson": """removals = set(l2)
l1[:] = (item for item in l1 if item not in removals)""",
"mgilson_set": """l1 = list(set(l1).difference(l2))"""
}        

for idea in stmts:
    print("{0}: {1}".format(idea, timeit.timeit(setup=setup, stmt=stmts[idea])))

Results (Python 3.3.0 64bit, Win7):

mgilson_set: 2.5841989922197333
mgilson: 3.7747968857414813
petr: 1.9669433777815701
Latty_mgilson: 7.262900152285258
millimoose: 3.1890831105541793
Lattyware: 4.573971325181478

Upvotes: 1

petr
petr

Reputation: 2586

Why not making it a bit simpler? No need to actually iterate over l1 if we only want to remove elements present in l2:

for item in l2:
    while item in l1:
        l1.remove(item)

This gives you exactly the output desired...

Also, as commenters point out, if there is a possibility that we can have duplicates:

l1 = filter(lambda x: x not in l2, l1)

.. or many other variations using list comprehensions.

Upvotes: 7

mgilson
mgilson

Reputation: 309841

You want the outer loop to read:

for x in l1[:]:
   ...

You can't change a list while iterating over it and expect reasonable results. The above trick makes a copy of l1 and iterates over the copy instead.

Note that if order doesn't matter in the output list, and your elements are unique and hashable, you could use a set:

set(l1).difference(l2)

which will give you a set as output, but you can construct a list from it easily:

l1 = list(set(l1).difference(l2))

Upvotes: 3

Minion91
Minion91

Reputation: 1929

This is easily explained like this.

consider the first array you have:

| 1 | 2 | 3 | 4 | 5 | 6 |

Now you start iterating

| 1 | 2 | 3 | 4 | 5 | 6 |
  ^

Nothing happens, iterator increments

| 1 | 2 | 3 | 4 | 5 | 6 |
      ^

2 gets removed

| 1 | 3 | 4 | 5 | 6 |
      ^

iterator increments

| 1 | 3 | 4 | 5 | 6 |
          ^

And voila, 3 is still there.

The solution is to iterate ove a copy of the vector e.g.

for x in l1[:]: <- slice on entire array
    if x in l2:
        l1.remove(x)

or to iterate in reverse:

for x in reversed(l1):
    if x in l2:
        l1.remove(x)

Which acts like this:

| 1 | 2 | 3 | 4 | 5 | 6 |
              ^

| 1 | 2 | 3 | 4 | 5 | 6 |
          ^

| 1 | 2 | 4 | 5 | 6 |
          ^

| 1 | 2 | 4 | 5 | 6 |
      ^

| 1 | 4 | 5 | 6 |
      ^

| 1 | 4 | 5 | 6 |
  ^

Upvotes: 9

Gareth Latty
Gareth Latty

Reputation: 88977

As others have said, you can't edit a list while you loop over it. A good option here is to use a list comprehension to create a new list.

removals = set(l2)
l1 = [item for item in l1 if item not in removals]

We make a set as a membership check on a set is significantly faster than on a list.

Upvotes: 2

Related Questions