MAS
MAS

Reputation: 4993

regular expression to extract part of email address

I am trying to use a regular expression to extract the part of an email address between the "@" sign and the "." character. This is how I am currently doing it, but can't get the right results.

company = re.findall('^From:.+@(.*).',line)

Gives me:

['@iupui.edu']

I want to get rid of the .edu

Upvotes: 2

Views: 2268

Answers (4)

Alex
Alex

Reputation: 8303

A simple example would be:

>>> import re
>>> re.findall(".*(?<=\@)(.*?)(?=\.)", "From: [email protected]")
['moo']
>>> re.findall(".*(?<=\@)(.*?)(?=\.)", "From: [email protected]")
['moo-hihihi']

This matches the hostname regardless of the beginning of the line, i.e. it's greedy.

Upvotes: 2

Jan Eglinger
Jan Eglinger

Reputation: 4090

To match a literal . in your regex, you need to use \., so your code should look like this:

company = re.findall('^From:.+@(.*)\.',line)
#                                  ^ this position was wrong

See it live here.

Note that this will always match the last occurrence of . in your string, because (.*) is greedy. If you want to match the first occurence, you need to exclude any . from your capturing group:

company = re.findall('^From:.+@([^\.]*)\.',line)

Regular expression visualization

See a demo.

Upvotes: 3

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You could just split and find:

s = " [email protected] I"
s = s.split("@", 1)[-1]
print(s[:s.find(".")])

Or just split if it is not always going to match your string:

s = s.split("@", 1)[-1].split(".", 1)[0]

If it is then find will be the fastest:

i = s.find("@")
s = s[i+1:s.find(".", i)]

Upvotes: 1

Rahul Tripathi
Rahul Tripathi

Reputation: 172378

You can try this:

(?<=\@)(.*?)(?=\.)

See a demo.

Upvotes: 3

Related Questions