Denise
Denise

Reputation: 13

Removing characters from each item in a list and counting the same items

I have a text file that each row has an HTTP request. First, I created a list from the text file and now trying to count how many times a domain sent a request. Each row has the full URL so I need to get rid of anything after ".com" to keep the domains only and count the total number of requests made by that domain. For instance, based on the list below, the output would be

Upvotes: 1

Views: 29

Answers (1)

cs95
cs95

Reputation: 402932

You could do this using re and a Counter -

  1. Extract domains with re.match
  2. Pass the expression to the Counter constructor
from collections import Counter
import re

c = Counter(re.match('.*com', i).group(0) for i in my_list)

print(c)
Counter({'https:/books.com': 3, 'https:/news.com': 4, 'https:/recipes.com': 4})

Do note that re.match in a (generator) comprehension cannot handle errors (which might occur if your list contains an invalid URL). In that case, you might consider using a loop -

r = []
for i in my_list:
    try:
        r.append(re.match('.*com', i).group(0))
    except AttributeError:
        pass

c = Counter(r)

Upvotes: 1

Related Questions