inertia
inertia

Reputation: 4127

How to split email with python regex?

I'm trying isolate the value that comes after "+" sign in an email. For example, if I have "[email protected]", I want to get the value you "company". It seems like the + sign kind of messes up the regex and I don't know where to go from here.

Here is what I wrote using re:

re.findall(r'something+(.*?)@',st)

Upvotes: 0

Views: 550

Answers (2)

Martin Tournoij
Martin Tournoij

Reputation: 27822

The problem with your regexp is that + is a special character, meaning "repeat the previous character one or more times", in your case, it would match g one time, and then the (.*?) would match the literal +.

The solution is to escape the + by preceding it with a \:

>>> email = '[email protected]'
>>> re.findall(r'something\+(.*?)@', email)
['company']

Having said that, you don't really need a regular expression here.

Your goal is to get all text between the first + and the first @, which you can do with:

>>> email = '[email protected]'
>>> email[email.find('+')+1:email.find('@')]
'company'

Note that this code will give unexpected results if there's no + or @, so you'll probably want to add a check around this (e.g. if '+' in email: ...).

In addition, you can actually have quoted @s and such in emails, so this is not 100% RFC-compliant. However, last time I checked many MTAs and email clients don't support that anyway, so it's not really something you need to worry about as such.

Upvotes: 1

Ozgur Vatansever
Ozgur Vatansever

Reputation: 52093

+ acts like a special character (a repetition operator) when defining a regular expression. You need \ to escape it:

>>> st = "[email protected]"
>>> re.findall(r'something\+(.*?)@', st)
["company"]

Upvotes: 2

Related Questions