Reputation: 4127
I'm trying isolate the value that comes after "+" sign in an email. For example, if I have "[email protected]", I want to get the value you "company". It seems like the + sign kind of messes up the regex and I don't know where to go from here.
Here is what I wrote using re:
re.findall(r'something+(.*?)@',st)
Upvotes: 0
Views: 550
Reputation: 27822
The problem with your regexp is that +
is a special character, meaning "repeat the previous character one or more times", in your case, it would match g
one time, and then the (.*?)
would match the literal +
.
The solution is to escape the +
by preceding it with a \
:
>>> email = '[email protected]'
>>> re.findall(r'something\+(.*?)@', email)
['company']
Having said that, you don't really need a regular expression here.
Your goal is to get all text between the first +
and the first @
, which you can do with:
>>> email = '[email protected]'
>>> email[email.find('+')+1:email.find('@')]
'company'
Note that this code will give unexpected results if there's no +
or @
, so you'll probably want to add a check around this (e.g. if '+' in email: ...
).
In addition, you can actually have quoted @
s and such in emails, so this is not 100% RFC-compliant. However, last time I checked many MTAs and email clients don't support that anyway, so it's not really something you need to worry about as such.
Upvotes: 1
Reputation: 52093
+
acts like a special character (a repetition operator) when defining a regular expression. You need \
to escape it:
>>> st = "[email protected]"
>>> re.findall(r'something\+(.*?)@', st)
["company"]
Upvotes: 2