Reputation: 1

Python count emails in a text file

I have a text file that contains emails and ID's, it looks something like this:

[email protected]:1111111
[email protected]:12313
[email protected]:121213
[email protected]:12313
[email protected]:123113

What I want to do is make a code to count how many times an email occurred and lists them for me. For example:

@hotmail.com : 2
@gmail.com : 2
@yahoo.com : 1

I made a code that counts the emails but it also counts their name and their ID which I don't want

Here's the code:

import string
  
# Open the file in read mode
text = open("sample.txt", "r")
  
# Create an empty dictionary
d = dict()
  
# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()
  
    # Convert the characters in line to 
    # lowercase to avoid case mismatch
    line = line.lower()
  
    # Put a space behind @ and remove : with a space
    line = line.replace("@", " @")
    line = line.replace(":", " ")
  
    # Split the line into words
    words = line.split(" ")
    
    # Iterate over each word in line
    for word in words:
        
        # Check if the word is already in dictionary
        if word in d:
            # Increment count of word by 1
            d[word] = d[word] + 1
        else:
            # Add the word to dictionary with count 1
            d[word] = 1
  
# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

The output would then look like this:

mark : 1
@gmail.com : 2
1111111 : 1
matt : 1
@hotmail.com : 2
12313 : 2
harry : 1
121213 : 1
matthew : 1
tom : 1
@yahoo.com : 1
123113 : 1

Is there a way to make it only count lines that start with @?

I'm very new to python so I appreciate any kind of help! Thank you

Upvotes: 0

Answers (4)

TahaHaza00

Reputation: 53

The way to solve this problem is to add a for loop to look through each line of text and detect whether it has an email in it with the word @ and the ID if it has a semi colon in it or numbers, (if the emails don't have numbers). Here's the code.

# replace this
# Loop through each line of the file
for line in text:
    # Remove the leading spaces and newline character
    line = line.strip()

# with this

# Loop through each line of the file
for line in text:
    if "@" in Email:
        line == Email
    else:
        line == ID

And inside your .txt file you can give them an attribute called Email for Email and ID for ID. Like the following:

Email: [email protected]
ID: ID

or something like that. DISCLAMER: If there is something else that isn't an ID or an Email the code may not work.

Edit: I meant like as in the txt thing that I mentioned won't work, do it yourself because I don't know how to.

Upvotes: 0

letdatado

Reputation: 241

Do you want to count with respect to the domain name (hotmail, yahoo, gmail, etc)? Like, how many guys use gmail, or hotmail as their email, etc... If thats the case, you can use following code:

dict_count = {}
domain_list = []
for element in lista:
    res = re.findall(r'@\w+\.\w+', element) 
    domain_list.append(res)
domain_list = [item for sublist in t for item in sublist]
for item in domain_list:
    if item not in dict_count:
        dict_count[item] = 1
    else:
        dict_count[item] += 1
print(dict_count)

Thanks a lot Your feedback will be appreciated.

P.S. Here is the output:
{'@gmail.com': 2, '@hotmail.com': 2, '@yahoo.com': 1}

Upvotes: 0

accdias

Reputation: 5372

Here is a simple alternative using collections.Counter():

from collections import Counter

with open('sample.txt') as f:
    c = Counter([_.strip().split('@')[1].split(':')[0].lower() for _ in f])

print(c)

The code above will result in something like this:

Counter({'gmail.com': 2, 'hotmail.com': 2, 'yahoo.com': 1})

Upvotes: 3

user15801675

Reputation:

Here is the code.

line=line.strip("\n").split(":")[0] is the main line. It says that first strip the "\n" character, then split it on : and take the first part.

from collections import Counter
with open("z.txt","r+") as file:
    email=[]
    read_lines=file.readlines()
    for line in read_lines:
        line=line.strip("\n").split(":")[0]
        x=line.index("@")
        email.append(line[x:])
mail_servers=dict(Counter(email))
print("------ Search Found ------\n")
for key,value in mail_servers.items():
    print(key,":",value)

Output:

------ Search Found ------

@gmail.com : 2
@hotmail.com : 2
@yahoo.com : 1

Upvotes: 0

Python count emails in a text file

Answers (4)

Related Questions