How to return everything in a string that is not matched by a regex?

Question

I have a string and a regular expression that matches portions of the string. I want to return a string representing what's left of the original string after all matches have been removed.

import re

string="
Good Morning, 
As per last email"

pattern = r'<[a-zA-Z0-9 ="/\-:;.]*>'

re.findall(pattern, string)

['',
 '
',
 '',
 '',
 '
',
 '',
 '']

desired_string = "Good Morning, As per last email"

Andy · Accepted Answer

Instead of a regular expression, use an HTML parser like BeautifulSoup. It looks like you are trying to strip the HTML elements and get the underlying text.

from bs4 import BeautifulSoup

string="""
Good Morning, 
As per last email"""

soup = BeautifulSoup(string, 'lxml')

print(soup.get_text())

This outputs:

Good Morning, As per last email

One thing to notice is that the was changed to a regular space using this method.

How to return everything in a string that is not matched by a regex?

Answers (2)

Related Questions