Reputation: 86
I am doing Python for everybody's Course on Coursera so I just learned how to access the file from the Web with Python.
So here what I am trying to do is to extract the Email from the lines which are starting with the From: but I am getting nothing.
There are emails in lines which are starting with From:
because I have done this with File Handling method but it's not working when I tried it on file which is on Server so I guess it is to do with the white space.
So Anyways Guys, Help me I am stuck
import socket
import re
dic = dict()
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
mysock.connect(('data.pr4e.org', 80))
except:
print("Can't find the server.\nCheck your internet Connection")
cmd = 'GET http://data.pr4e.org/mbox-short.txt HTTP/1.0\r\n\r\n'.encode()
try:
mysock.send(cmd)
except:
print("Connection Lost:\nCheck your Internet Connection")
while True:
data = mysock.recv(512)
if len(data) < 1:
break
data = data.decode()
data = data.rstrip()
k = re.findall('^From:.(\S+@\S+)', data)
if (len(k)) > 0:
print(k)
This is the Link from where you can download the file
Upvotes: 2
Views: 165
Reputation: 86
Well, I found the better way of what I am doing here. I can do this easily and more efficiently by using the urllib.request library.
import urllib.request, urllib.parse, urllib.error
import re
fhand = urllib.request.urlopen('http://data.pr4e.org/mbox-short.txt')
for line in fhand:
k = re.findall(r'(?m)^From:\s*(\S+@\S+)', line)
if len(k) > 1:
print(k)
Upvotes: -1
Reputation: 626952
You may get the emails using
k = re.findall(r'(?m)^From:\s*(\S+@\S+)', data)
See the regex demo.
Details
(?m)^
- start of a lineFrom:
- a literal string\s*
- 0+ whitespaces(\S+@\S+)
- Capturing group 1 (the output of re.findall
will only contain this value): one or more non-whitespace chars, @
and one or more non-whitespace chars.Upvotes: 3