dokondr
dokondr

Reputation: 3539

Python regex to remove emails from string

Need to replace emails in a string, so:

inp = 'abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111"

should result in:

out = 'abc 123 foo bar"

What regex to use?

In [148]: e = '[^\@]\@[^\@]'
In [149]: pattern = re.compile(e)
In [150]: pattern.sub('', s)  
Out[150]: 'one aom 123 4two'
In [151]: s
Out[151]: 'one ab@com 123 4 @ two'

Does not work for me

Upvotes: 10

Views: 26348

Answers (5)

Simone
Simone

Reputation: 615

In case somebody needs to remove e-mail addresses that include hyphens or dots such as [email protected] or [email protected]. Below an amended solution based on what TBhavnani suggested that removes these too.

import re

text = ['[email protected] is my e-mail','My e-mail is [email protected]']

for line in text:
        line = line.strip()
        line = re.sub(r'\w*(?:[\.\-]\w+)@[A-Za-z]*\.?[A-Za-z0-9]*', "", line)       
        print(line)    

Gives following output:

 is my e-mail
My e-mail is 

Upvotes: 0

TBhavnani
TBhavnani

Reputation: 771

Adding as nobody as added a regex :

text= 'abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111'

required_output=re.sub(r'[A-Za-z0-9]*@[A-Za-z]*\.?[A-Za-z0-9]*', "", text)
    
required_output=" ".join(required_output.split())

Upvotes: 4

Gawil
Gawil

Reputation: 1211

Replace :
\S*@\S*\s?
by ''

Demo here

Some explanations :
\S* : match as many non-space characters you can
@ : then a @
\S* : then another sequence of non-space characters
\s? : And eventually a space, if there is one. Note that the '?' is needed to match an address at the end of the line. Because of the greediness of '?', if there is a space, it will always be matched.

Upvotes: 25

stamaimer
stamaimer

Reputation: 6495

out = ' '.join([item for item in inp.split() if '@' not in item])

Upvotes: 1

Mike Driscoll
Mike Driscoll

Reputation: 33111

I personally prefer doing string parsing myself. Let's try splitting the string and getting rid of the items that have the @ symbol:

inp = 'abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111'
items = inp.split()

Now we can do something like this:

>>> [i for i in items if '@' not in i]
['abc', '123', 'foo', 'bar']

That gets us almost there. Let's modify it a bit more to add a join:

>>> ' '.join([i for i in inp.split() if '@' not in i])
'abc 123 foo bar'

It may not be RegEx, but it works for the input you gave.

Upvotes: 3

Related Questions