Reputation: 2214

get max duplicate item in list

I have this list:

mylist = [20, 30, 25, 20, 30]

After getting the duplicated values indexes using

[i for i, x in enumerate(mylist) if mylist.count(x) > 1]

the result is:

`[0, 1, 3, 4]`

having two pairs of duplicated values. I'd like to know, how can i get only the higher duplicated value? In this list it is 30 or any of it's indexes, 1 or 4, instead of the whole list of duplicated values.

Regards...

Upvotes: 2

Answers (6)

Vlad Bezden

Reputation: 89765

mylist = [20, 30, 25, 20, 30]
result = max((mylist.count(x), x) for x in set(mylist))
print(result)
>>> (2, 30)

Here is how it works:

set(mylist) - you create only unique values from the mylist (20, 30, 25)
then using the generator comprehension you create tuples with first item number of times that value occured ((1, 25), (2, 20), (2, 30))
since tuples are comparable item by item you can get max tuple in the sequence, which in this case (2, 30) because it's greater than (2, 20)

Upvotes: 0

the wolf

Reputation: 35572

Just some relative timings to consider:

from collections import Counter
from collections import defaultdict

mylist = [20, 30, 25, 20, 30]

def f1():
    seen = set()
    dups = set()
    for x in mylist:
        if x in seen:
            dups.add(x)
        seen.add(x)
    max_dups = max(dups)

def f2():
    max(x for x in mylist if mylist.count(x) > 1)

def f3():
    max(k for k,v in Counter(mylist).items() if v>1)

def f4():
    dd = defaultdict(int)
    for i in mylist:
        dd[i] += 1

    max(i for i in dd if dd[i] > 1)

def f5():
    d = dict.fromkeys(mylist, 0)            
    for i in mylist:
       d[i] += 1

    max(i for i in d if d[i] > 1)

cmpthese([f1,f2,f3,f4,f5])

prints:

   rate/sec     f3     f4     f5     f2     f1
f3   93,653     -- -63.3% -73.0% -79.2% -83.6%
f4  255,137 172.4%     -- -26.3% -43.3% -55.3%
f5  346,238 269.7%  35.7%     -- -23.1% -39.3%
f2  450,356 380.9%  76.5%  30.1%     -- -21.0%
f1  570,419 509.1% 123.6%  64.7%  26.7%     --

So choose wisely

Upvotes: 1

John La Rooy

Reputation: 304473

This one is O(n)

>>> from collections import Counter
>>> mylist = [20, 30, 25, 20, 30]
>>> max(k for k,v in Counter(mylist).items() if v>1)
30

Upvotes: 6

srgerg

Reputation: 19339

Another O(n) way of doing it, just because...

>>> from collections import defaultdict
>>> 
>>> mylist = [20,30,25,20,30]
>>> dd = defaultdict(int)
>>> for i in mylist:
...    dd[i] += 1
...
>>> max(i for i in dd if dd[i] > 1)
30

You can also do it using a regular old dict:

>>> d = dict.fromkeys(mylist, 0)
>>> for i in mylist:
...   d[i] += 1
... 
>>> max(i for i in d if d[i] > 1)
30

Upvotes: 1

Igor Chubin

Reputation: 64623

$ cat /tmp/1.py
from itertools import groupby

def find_max_repeated(a):
    a = sorted(a, reverse = True)
    for k,g in groupby(a):
        gl = list(g)
        if len(gl) > 1:
            return gl[0]

a = [1,1,2,3,3,4,5,4,6]
print find_max_repeated(a)

$ python /tmp/1.py
4

Upvotes: 0

Ned Batchelder

Reputation: 376052

Getting the maximum duplicated value:

max(x for x in mylist if mylist.count(x) > 1)

This has O(n**2) performance because of the repeated count() calls, unfortunately. Here's a wordier way to do the same thing that will have O(n) performance, important if the list is long:

seen = set()
dups = set()
for x in mylist:
    if x in seen:
        dups.add(x)
    seen.add(x)
max_dups = max(dups)

Upvotes: 6

get max duplicate item in list

Answers (6)

Related Questions