Using Python to Access Web Data with Regular Expression is not working

Question

I am doing Python for everybody's Course on Coursera so I just learned how to access the file from the Web with Python.

So here what I am trying to do is to extract the Email from the lines which are starting with the From: but I am getting nothing.

There are emails in lines which are starting with From: because I have done this with File Handling method but it's not working when I tried it on file which is on Server so I guess it is to do with the white space.

So Anyways Guys, Help me I am stuck

import socket
import re
dic = dict()
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
    mysock.connect(('data.pr4e.org', 80))
except:
    print("Can't find the server.
Check your internet Connection")
cmd = 'GET http://data.pr4e.org/mbox-short.txt HTTP/1.0

'.encode()
try:
    mysock.send(cmd)
except:
    print("Connection Lost:
Check your Internet Connection")
while True:
    data = mysock.recv(512)
    if len(data)  < 1:
        break
    data = data.decode()
    data = data.rstrip()
    k = re.findall('^From:.(\S+@\S+)', data)
    if (len(k)) > 0:
        print(k)

This is the Link from where you can download the file

Wiktor Stribiżew · Accepted Answer

You may get the emails using

k = re.findall(r'(?m)^From:\s*(\S+@\S+)', data)

See the regex demo.

Details

(?m)^ - start of a line
From: - a literal string
\s* - 0+ whitespaces
(\S+@\S+) - Capturing group 1 (the output of re.findall will only contain this value): one or more non-whitespace chars, @ and one or more non-whitespace chars.

Using Python to Access Web Data with Regular Expression is not working

Answers (2)

Related Questions