Hairo
Hairo

Reputation: 2214

get max duplicate item in list

I have this list:

mylist = [20, 30, 25, 20, 30]

After getting the duplicated values indexes using

[i for i, x in enumerate(mylist) if mylist.count(x) > 1]

the result is:

`[0, 1, 3, 4]` 

having two pairs of duplicated values. I'd like to know, how can i get only the higher duplicated value? In this list it is 30 or any of it's indexes, 1 or 4, instead of the whole list of duplicated values.

Regards...

Upvotes: 2

Views: 14729

Answers (6)

Vlad Bezden
Vlad Bezden

Reputation: 89527

mylist = [20, 30, 25, 20, 30]
result = max((mylist.count(x), x) for x in set(mylist))
print(result)
>>> (2, 30)

Here is how it works:

  • set(mylist) - you create only unique values from the mylist (20, 30, 25)
  • then using the generator comprehension you create tuples with first item number of times that value occured ((1, 25), (2, 20), (2, 30))
  • since tuples are comparable item by item you can get max tuple in the sequence, which in this case (2, 30) because it's greater than (2, 20)

Upvotes: 0

the wolf
the wolf

Reputation: 35522

Just some relative timings to consider:

from collections import Counter
from collections import defaultdict

mylist = [20, 30, 25, 20, 30]

def f1():
    seen = set()
    dups = set()
    for x in mylist:
        if x in seen:
            dups.add(x)
        seen.add(x)
    max_dups = max(dups)

def f2():
    max(x for x in mylist if mylist.count(x) > 1)

def f3():
    max(k for k,v in Counter(mylist).items() if v>1)

def f4():
    dd = defaultdict(int)
    for i in mylist:
        dd[i] += 1

    max(i for i in dd if dd[i] > 1)

def f5():
    d = dict.fromkeys(mylist, 0)            
    for i in mylist:
       d[i] += 1

    max(i for i in d if d[i] > 1)

cmpthese([f1,f2,f3,f4,f5])    

prints:

   rate/sec     f3     f4     f5     f2     f1
f3   93,653     -- -63.3% -73.0% -79.2% -83.6%
f4  255,137 172.4%     -- -26.3% -43.3% -55.3%
f5  346,238 269.7%  35.7%     -- -23.1% -39.3%
f2  450,356 380.9%  76.5%  30.1%     -- -21.0%
f1  570,419 509.1% 123.6%  64.7%  26.7%     --

So choose wisely

Upvotes: 1

John La Rooy
John La Rooy

Reputation: 304137

This one is O(n)

>>> from collections import Counter
>>> mylist = [20, 30, 25, 20, 30]
>>> max(k for k,v in Counter(mylist).items() if v>1)
30

Upvotes: 6

srgerg
srgerg

Reputation: 19329

Another O(n) way of doing it, just because...

>>> from collections import defaultdict
>>> 
>>> mylist = [20,30,25,20,30]
>>> dd = defaultdict(int)
>>> for i in mylist:
...    dd[i] += 1
...
>>> max(i for i in dd if dd[i] > 1)
30

You can also do it using a regular old dict:

>>> d = dict.fromkeys(mylist, 0)
>>> for i in mylist:
...   d[i] += 1
... 
>>> max(i for i in d if d[i] > 1)
30

Upvotes: 1

Igor Chubin
Igor Chubin

Reputation: 64563

$ cat /tmp/1.py
from itertools import groupby

def find_max_repeated(a):
    a = sorted(a, reverse = True)
    for k,g in groupby(a):
        gl = list(g)
        if len(gl) > 1:
            return gl[0]

a = [1,1,2,3,3,4,5,4,6]
print find_max_repeated(a)

$ python /tmp/1.py
4

Upvotes: 0

Ned Batchelder
Ned Batchelder

Reputation: 375504

Getting the maximum duplicated value:

max(x for x in mylist if mylist.count(x) > 1)

This has O(n**2) performance because of the repeated count() calls, unfortunately. Here's a wordier way to do the same thing that will have O(n) performance, important if the list is long:

seen = set()
dups = set()
for x in mylist:
    if x in seen:
        dups.add(x)
    seen.add(x)
max_dups = max(dups)

Upvotes: 6

Related Questions