Reputation: 11
I'm having some trouble with the output I am receiving on this problem. Basically, I have a text file (https://www.py4e.com/code3/mbox.txt) and I am attempting to first have python print how many email addresses are found in it and then print each of those addresses on subsequent lines. A sample of my output is looking like this:
Received: (from apache@localhost)
There were 22003 email addresses in mbox.txt
for [email protected]; Thu, 18 Oct 2007 11:31:49 -0400
There were 22004 email addresses in mbox.txt
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to [email protected] using -f
There were 22005 email addresses in mbox.txt
What am I doing wrong here? Here's my code
fhand = open('mbox.txt')
count = 0
for line in fhand:
line = line.rstrip()
if '@' in line:
count = count + 1
print('There were', count, 'email addresses in mbox.txt')
if '@' in line:
print(line)
Upvotes: 1
Views: 65
Reputation: 17156
The following modifies your code to use a regular expression to find emails in text lines.
import re
# Pattern for email
# (see https://www.geeksforgeeks.org/extracting-email-addresses-using-regular-expressions-python/)
pattern = re.compile(r'\S+@\S+')
with open('mbox.txt') as fhand:
emails = []
for line in fhand:
# Detect all emails in line using regex pattern
found_emails = pattern.findall(line)
if found_emails:
emails.extend(found_emails)
print('There were', len(emails), 'email addresses in mbox.txt')
if emails:
print(*emails, sep="\n")
Output
There were 44018 email addresses in mbox.txt
[email protected]
<[email protected]>
<[email protected]>
<[email protected]>;
<[email protected]>;
<[email protected]>;
apache@localhost)
[email protected];
[email protected]
[email protected]
....
....
...etc...
Upvotes: 1
Reputation: 479
Can you make it clearer what your expected output is compared to your actual output?
You have two if '@' in line'
statements that should be combined; there's no reason to ask the same question twice.
You count the number of lines that contain an @
symbol and then per line, print the current count.
If you want to only print the count once, then put it outside (after) your for loop.
If you want to print the email addresses and not the whole lines that contain them, then you'll need to do some more string processing to extract the email from the line.
Don't forget to close your file when you've finished with it.
Upvotes: 0