Reputation: 593
I have written some code in python to delete unique numbers from a list so given the input:
[1,2,3,2,1]
It should return
[1,2,2,1]
But my program returns
[1,2,1]
My code is:
for i in data:
if data.count(i) == 1:
data.pop(i)
I found the error occurs at the if data.count(i) == 1:
. It says data.count(2) == 1
when clearly there are 2 occurrences of the number 2 in the list. I do not understand why this is giving the wrong answer
Upvotes: 4
Views: 1037
Reputation: 411
Using del
instead of pop
data = [1,2,3,2,1]
for i in data:
if data.count(i)== 1:
index = data. index(i)
del data[index]
print(data)
produces,
[1, 2, 2, 1]
[Program finished]
Upvotes: 0
Reputation: 9977
This is a recursive problem. You misunderstood list.pop()
. It takes an index not a specific element. Therefore, you're not removing what you expect.
the thing to do here is to use enumerate
,
data = [1,2,3,2,1]
#You could use dup_list = data[:] for python 3.2 and below
dup_list = data.copy()
for index,item in enumerate(dup_list):
if dup_list.count(item) == 1:
data.pop(index)
this way you're popping the item a the right index.
EDIT
I edited, thanks to @wim for his comment. I'm now iterating over a copy (dup_list
) of the original list not to iterate and mutate the original list at the same time.
Also, I created a copy explicity for the sake of explanation. But you could use a shorter version of the code,
data = [1,2,3,2,1]
#using data[:] to iterate over a copy
for index,item in enumerate(data[:]):
if data.count(item) == 1:
data.pop(index)
Note that I added a comment because this syntax can be confusing for some people.
Upvotes: 3
Reputation: 2545
I believe a more pythonic answer could be, using list comprehension:
result = [x for x in data if data.count(x) > 1]
I have moved C.Nivis and Patrick Artner answers inside a function to run timeit more easily on it.
To account for the time required to call the function, I have also wrapper the list comprehension into a function call.
def remove_singletons(data):
"""Return list with no singleton using for loops."""
res = []
for i in data:
if data.count(i) > 1:
res.append(i)
return res
def remove_singletons_lc(data):
"""Return list with no singleton using for list comprehension."""
return [x for x in data if data.count(x)>1]
from collections import Counter
def remove_singletons_counter(data):
c = Counter(data)
return [x for x in data if c[x] > 1]
import numpy as np
def remove_singletons_numpy(data):
a = np.array(data)
_, ids, counts = np.unique(a, return_counts=True, return_inverse=True)
return a[counts[ids] != 1]
l = [1,2,3,2,1]
%timeit remove_singletons(l)
>>> 1.42 µs ± 46.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit remove_singletons_lc(l)
>>> 1.2 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Counter
%timeit remove_singletons_counter(l)
>>> 6.55 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy.unique
%timeit remove_singletons_numpy(l)
>>> 53.8 µs ± 3.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
It appears that list comprehension is slightly but consistently faster than looping and quite faster than Counter
with small lists. Numpy, for small lists, is the slower one.
Suppose that we have a large list of n random elements, from [0, n]
import random
n = 10000
l = [random.randint(0, n) for i in range(n)]
%timeit remove_singletons(l)
>>> 1.5 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit remove_singletons_lc(l)
>>> 1.51 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Counter
%timeit remove_singletons_counter(l)
>>> 2.65 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
numpy.unique
%timeit remove_singletons_numpy(l)
>>> 1.75 ms ± 38.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For large lists the undisputed winner is numpy.unique
, followed just behind by Counter
.
For small lists list comprehension seems to do the trick, but for larger lists the numpy.unique
approach works best.
Upvotes: 3
Reputation: 59274
Do not modify a list while iterating over it. Behavior will most likely not be the desired one.
numpy.unique
with return_counts=True
One more option is to use numpy
a = np.array([1,2,2,3,2,1])
_, ids, counts = np.unique(a, return_counts=True, return_inverse=True)
a[counts[ids] != 1]
For big arrays this is way faster than list comprehension and Counter
a = np.array([1,2,2,3,2,1]*1000) #numpy array
b = list(a) # list
Then
%timeit _, ids, c = np.unique(a, return_counts=True, return_inverse=True);a[c[ids] != 1]
225 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [x for x in a if b.count(x) > 1]
885 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [x for x in a if c[x] > 1]
1.53 ms ± 58.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 1
Reputation: 51663
If you have a long list, you should put all numbers into a Counter(iterable) - dictionary.
from collections import Counter
data = [1,2,3,2,1]
c = Counter(data)
cleaned = [x for x in data if c[x] > 1]
print(cleaned)
This will count all occurences with one pass of your list (O(n)
) and the lookup how often it occurs inside the created dictionary is O(1)
. Together this is much faster then use a list comprehension like
result = [x for x in data if data.count(x) > 1]
for a list of 100 values it will go through your 100 values 100 times, to count each single one of them wich is O(n^2) - bad thing to have.
Output:
[1,2,2,1]
Upvotes: 6
Reputation: 4151
You could use list comprehension to build a new list:
[ x for x in data if data.count(x)>1 ]
Also, the pop()
method takes the index of the element as an argument, not the value.
Upvotes: -1
Reputation: 13106
Try appending to a new list rather than changing your old one:
res = []
data = [1,2,3,2,1]
for i in data:
if data.count(i) > 1:
res.append(i)
It's bad practice to change list size during iteration, and pop
will do that. This returns res = [1, 2, 2, 1]
Upvotes: 4