gonzalloe
gonzalloe

Reputation: 313

How to check if there are more than 3 identical string in a list, Python

I have a list which shown as below:

a = ['www.hughes-family.org', 'www.bondedsender.com', 'thinkgeek.com', 'www.hughes-family.org', 'www.hughes-family.org', 'lists.sourceforge.net', 'www.hughes-family.org']

How can I check if there are more than three identical urls in this list? I've tried the set() function but it shows whenever there is duplicated url. This is what I tried:

if len(set(a)) < len(a):

Upvotes: 0

Views: 244

Answers (4)

Ganesh Tata
Ganesh Tata

Reputation: 1195

I am assuming that you want to check if any of the URLs occur more than 3 times in the list. You can go through the list, and create a dictionary containing the strings as keys, and their respective counts as values ( Similar to the output of the collections.Counter).

In [1]: a = ['www.hughes-family.org', 'www.bondedsender.com', 'thinkgeek.com', '
   ...: www.hughes-family.org', 'www.hughes-family.org', 'lists.sourceforge.net'
   ...: , 'www.hughes-family.org']

In [2]: is_present = False

In [3]: url_counts = dict()

In [4]: for url in a:
    ...:     if not url_counts.get(url, None):  # If the URL is not present as a key, insert the URL with value 0
    ...:         url_counts[url] = 0
    ...:     url_counts[url] += 1  # Increment count
    ...:     if url_counts[url] > 3:  # Check if the URL occurs more than three times
    ...:         print "The URL ", url, " occurs more than three times!"
    ...:         is_present = True
    ...:         break  # Come out of the loop if any one of the URLs occur more than three times

# output - The URL  www.hughes-family.org  occurs more than three times!

In [5]: is_present  # To check if there is a URL which occurs more than three times
Out[5]: True

Upvotes: 0

Aaditya Ura
Aaditya Ura

Reputation: 12669

You can use dict for catching the repeat things :

a = ['www.hughes-family.org', 'www.bondedsender.com', 'thinkgeek.com', 'www.hughes-family.org', 'www.hughes-family.org', 'lists.sourceforge.net', 'www.hughes-family.org']

count={}
for i,j in enumerate(a):
    if j not in count:
        count[j]=[i]
    else:
        count[j].append(i)


for i,j in count.items():
    if len(j)>1:
        #do you stuff

print(count)

output:

{'www.hughes-family.org': [0, 3, 4, 6], 'thinkgeek.com': [2], 'www.bondedsender.com': [1], 'lists.sourceforge.net': [5]}

Second method you can use defaultdict:

import collections

d=collections.defaultdict(list)
for i,j in enumerate(a):
    d[j].append(i)

print(d)

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

You can use list.count to get the number of urls that occur three or more times:

urls = ['www.hughes-family.org', 'www.bondedsender.com', 'thinkgeek.com', 'www.hughes-family.org', 'www.hughes-family.org', 'lists.sourceforge.net', 'www.hughes-family.org']
new_urls = [url for url in urls if urls.count(url) > 1]
if len(new_urls) > 3:
    pass #condition met

Upvotes: 2

Francisco
Francisco

Reputation: 11476

Use Counter.most_common:

>>> Counter(a).most_common(1)[0][1]
4

This returns the number of times the most common element appears.

Upvotes: 7

Related Questions