Reputation: 99

Find the email address that occurs the most in a txt file

I have to go through a txt file which contains all manner of info and pull the email address that occurs the most therewithin.

My code is as follows, but it does not work. It prints no output and I am not sure why. Here is the code:

name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = handle.readlines()
count = dict()
for name in names:
    name = name.split()
    for letters in name:
        if '@' not in letters: 
            name.remove(letters)
        else: 
            continue
    name = str(name)
    if name not in count:
        count[name] = 1
    else: 
        count[name] = count[name]+ 1
print(max(count, key=count.get(1)))

As I understand it, this code works as follows:

we first open the file, then we read the lines, then we create an empty dict

Then in the first for loop, we split the txt file into a list based on each line. Then, in the second for loop, for each item in each line, if there is no @, then it is removed. We then return for the original for loop, where, if the name is not a key in dict, it is added with a value of 1; else one is added to its value.

Finally, we print the max key & value.

Where did I go wrong???

Thank you for your help in advance.

Upvotes: 1

Answers (3)

MoeNeuron

Reputation: 68

You need to change the last line to:

print(max(count, key=count.get))

EDIT

For sake of more explanation:

You were providing max() with the wrong ordering function by key=count.get(1).

So, count.get(1) would return default value or None when the key argument you passed to get() isn't in the dictionary.

If so, max() would then behave by outputing the max string key in your dictionary (as long as all your keys are strings and your dictionary is not empty).

Upvotes: 2

user14389165

Reputation:

Import Regular Expressions (re) as it will help in getting emails.

import re
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = "\n".join(handle.readlines())
email_ids = re.findall(r"[0-9a-zA-Z._+%]+@[0-9a-zA-Z._+%]+[.][0-9a-zA-Z.]+", names)
email_ids = [(email_ids.count(email_id), email_id) for email_id in email_ids].sort(reverse=True)
email_ids = set([i[1] for i in email_ids)

In the variable email_ids you will get a set of the emails arranged on the basis of their occurrences, in descending order.

I know that the code is lengthy and has a few redundant lines, but there are there to make the code self-explanatory.

Upvotes: 0

Roberto Pérez Rico

Reputation: 1

Please use the following code:

names = '''[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]'''
count = list(names.split("\n"))
sett = set(names.split("\n"))

highest = count.count(count[0])
theone = count[0]
for i in sett:
    l = count.count(i)
    if l > highest:
        highest = l
        theone = i
print(theone)

Output:

[email protected]

Upvotes: 0

Find the email address that occurs the most in a txt file

Answers (3)

Related Questions