Reputation: 593

list.count() says one item in list when there are two

I have written some code in python to delete unique numbers from a list so given the input:

[1,2,3,2,1]

It should return

[1,2,2,1]

But my program returns

[1,2,1]

My code is:

for i in data:
    if data.count(i) == 1:
        data.pop(i)

I found the error occurs at the if data.count(i) == 1:. It says data.count(2) == 1 when clearly there are 2 occurrences of the number 2 in the list. I do not understand why this is giving the wrong answer

Upvotes: 4

Answers (7)

Subham

Reputation: 411

Using del instead of pop

data = [1,2,3,2,1]

for i in data:
        if data.count(i)== 1:
            index = data. index(i)
            del data[index]
            
print(data)

produces,

[1, 2, 2, 1]

[Program finished]

Upvotes: 0

scharette

Reputation: 9977

This is a recursive problem. You misunderstood list.pop(). It takes an index not a specific element. Therefore, you're not removing what you expect.

the thing to do here is to use enumerate,

data = [1,2,3,2,1]

#You could use dup_list = data[:] for python 3.2 and below
dup_list = data.copy()

for index,item in enumerate(dup_list):
    if dup_list.count(item) == 1:
        data.pop(index)

this way you're popping the item a the right index.

EDIT

I edited, thanks to @wim for his comment. I'm now iterating over a copy (dup_list) of the original list not to iterate and mutate the original list at the same time.

Also, I created a copy explicity for the sake of explanation. But you could use a shorter version of the code,

data = [1,2,3,2,1]

#using data[:] to iterate over a copy
for index,item in enumerate(data[:]):
    if data.count(item) == 1:
        data.pop(index)

Note that I added a comment because this syntax can be confusing for some people.

Upvotes: 3

Luca Cappelletti

Reputation: 2545

Solution using list comprehension

I believe a more pythonic answer could be, using list comprehension:

result = [x for x in data if data.count(x) > 1]

Solutions time comparison for example list

I have moved C.Nivis and Patrick Artner answers inside a function to run timeit more easily on it.

To account for the time required to call the function, I have also wrapper the list comprehension into a function call.

Setup

def remove_singletons(data):
    """Return list with no singleton using for loops."""
    res = []
    for i in data:
        if data.count(i) > 1:
            res.append(i)
    return res

def remove_singletons_lc(data):
    """Return list with no singleton using for list comprehension."""
    return [x for x in data if data.count(x)>1]

from collections import Counter

def remove_singletons_counter(data):
     c = Counter(data)
     return [x for x in data if c[x] > 1]

import numpy as np

def remove_singletons_numpy(data):
     a = np.array(data)
     _, ids, counts = np.unique(a, return_counts=True, return_inverse=True)
    return a[counts[ids] != 1]

l = [1,2,3,2,1]

Solution with loops

%timeit remove_singletons(l)
>>> 1.42 µs ± 46.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Solution with list comprehension

%timeit remove_singletons_lc(l)
>>> 1.2 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Solution with `Counter`

%timeit remove_singletons_counter(l)
>>> 6.55 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Solution with `numpy.unique`

%timeit remove_singletons_numpy(l)
>>> 53.8 µs ± 3.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Conclusions

It appears that list comprehension is slightly but consistently faster than looping and quite faster than Counter with small lists. Numpy, for small lists, is the slower one.

Solutions time comparison for large lists

Suppose that we have a large list of n random elements, from [0, n]

import random
n = 10000
l = [random.randint(0, n) for i in range(n)]

Solution with loops

%timeit remove_singletons(l)
>>> 1.5 s ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution with list comprehension

%timeit remove_singletons_lc(l)
>>> 1.51 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution with `Counter`

%timeit remove_singletons_counter(l)
>>> 2.65 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Solution with `numpy.unique`

%timeit remove_singletons_numpy(l)
>>> 1.75 ms ± 38.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Conclusions for large lists

For large lists the undisputed winner is numpy.unique, followed just behind by Counter.

Final conclusions

For small lists list comprehension seems to do the trick, but for larger lists the numpy.unique approach works best.

Upvotes: 3

rafaelc

Reputation: 59274

Do not modify a list while iterating over it. Behavior will most likely not be the desired one.

`numpy.unique` with `return_counts=True`

One more option is to use numpy

a = np.array([1,2,2,3,2,1])
_, ids, counts = np.unique(a, return_counts=True, return_inverse=True)
a[counts[ids] != 1]

For big arrays this is way faster than list comprehension and Counter

a = np.array([1,2,2,3,2,1]*1000) #numpy array
b = list(a) # list

Then

%timeit _, ids, c = np.unique(a, return_counts=True, return_inverse=True);a[c[ids] != 1]
225 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit [x for x in a if b.count(x) > 1]
885 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit [x for x in a if c[x] > 1]
1.53 ms ± 58.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 1

Patrick Artner

Reputation: 51663

If you have a long list, you should put all numbers into a Counter(iterable) - dictionary.

from collections import Counter
data = [1,2,3,2,1]
c = Counter(data)

cleaned = [x for x in data if c[x] > 1]

print(cleaned)

This will count all occurences with one pass of your list (O(n)) and the lookup how often it occurs inside the created dictionary is O(1). Together this is much faster then use a list comprehension like

result = [x for x in data if data.count(x) > 1]

for a list of 100 values it will go through your 100 values 100 times, to count each single one of them wich is O(n^2) - bad thing to have.

Output:

[1,2,2,1]

Upvotes: 6

xdze2

Reputation: 4151

You could use list comprehension to build a new list:

[ x for x in data if data.count(x)>1 ]

Also, the pop() method takes the index of the element as an argument, not the value.

Upvotes: -1

C.Nivs

Reputation: 13106

Try appending to a new list rather than changing your old one:

res = []
data = [1,2,3,2,1]

for i in data:
    if data.count(i) > 1:
        res.append(i)

It's bad practice to change list size during iteration, and pop will do that. This returns res = [1, 2, 2, 1]

Upvotes: 4

list.count() says one item in list when there are two

Answers (7)

Solution using list comprehension

Solutions time comparison for example list

Setup

Solution with loops

Solution with list comprehension

Solution with Counter

Solution with numpy.unique

Conclusions

Solutions time comparison for large lists

Solution with loops

Solution with list comprehension

Solution with Counter

Solution with numpy.unique

Conclusions for large lists

Final conclusions

numpy.unique with return_counts=True

Related Questions

Solution with `Counter`

Solution with `numpy.unique`

Solution with `Counter`

Solution with `numpy.unique`

`numpy.unique` with `return_counts=True`