Reputation: 1
I have a text file that contains emails and ID's, it looks something like this:
[email protected]:1111111
[email protected]:12313
[email protected]:121213
[email protected]:12313
[email protected]:123113
What I want to do is make a code to count how many times an email occurred and lists them for me. For example:
@hotmail.com : 2
@gmail.com : 2
@yahoo.com : 1
I made a code that counts the emails but it also counts their name and their ID which I don't want
Here's the code:
import string
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Put a space behind @ and remove : with a space
line = line.replace("@", " @")
line = line.replace(":", " ")
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
The output would then look like this:
mark : 1
@gmail.com : 2
1111111 : 1
matt : 1
@hotmail.com : 2
12313 : 2
harry : 1
121213 : 1
matthew : 1
tom : 1
@yahoo.com : 1
123113 : 1
Is there a way to make it only count lines that start with @?
I'm very new to python so I appreciate any kind of help! Thank you
Upvotes: 0
Views: 605
Reputation: 53
The way to solve this problem is to add a for loop to look through each line of text and detect whether it has an email in it with the word @ and the ID if it has a semi colon in it or numbers, (if the emails don't have numbers). Here's the code.
# replace this
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# with this
# Loop through each line of the file
for line in text:
if "@" in Email:
line == Email
else:
line == ID
And inside your .txt file you can give them an attribute called Email for Email and ID for ID. Like the following:
Email: [email protected]
ID: ID
or something like that. DISCLAMER: If there is something else that isn't an ID or an Email the code may not work.
Edit: I meant like as in the txt thing that I mentioned won't work, do it yourself because I don't know how to.
Upvotes: 0
Reputation: 241
Do you want to count with respect to the domain name (hotmail, yahoo, gmail, etc)? Like, how many guys use gmail, or hotmail as their email, etc... If thats the case, you can use following code:
dict_count = {}
domain_list = []
for element in lista:
res = re.findall(r'@\w+\.\w+', element)
domain_list.append(res)
domain_list = [item for sublist in t for item in sublist]
for item in domain_list:
if item not in dict_count:
dict_count[item] = 1
else:
dict_count[item] += 1
print(dict_count)
Thanks a lot Your feedback will be appreciated.
P.S. Here is the output:
{'@gmail.com': 2, '@hotmail.com': 2, '@yahoo.com': 1}
Upvotes: 0
Reputation: 5372
Here is a simple alternative using collections.Counter()
:
from collections import Counter
with open('sample.txt') as f:
c = Counter([_.strip().split('@')[1].split(':')[0].lower() for _ in f])
print(c)
The code above will result in something like this:
Counter({'gmail.com': 2, 'hotmail.com': 2, 'yahoo.com': 1})
Upvotes: 3
Reputation:
Here is the code.
line=line.strip("\n").split(":")[0]
is the main line. It says that first strip the "\n"
character, then split it on :
and take the first part.
from collections import Counter
with open("z.txt","r+") as file:
email=[]
read_lines=file.readlines()
for line in read_lines:
line=line.strip("\n").split(":")[0]
x=line.index("@")
email.append(line[x:])
mail_servers=dict(Counter(email))
print("------ Search Found ------\n")
for key,value in mail_servers.items():
print(key,":",value)
Output:
------ Search Found ------
@gmail.com : 2
@hotmail.com : 2
@yahoo.com : 1
Upvotes: 0