Reputation: 453
My Initial String consists of <span>
and some contents in between and a </span></span
, I would like to remove that piece(including span and contents inside it and /span) from my string , what should I do ?
Part of String that need to be Removed : "<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")
+14 variable strings+</span></span
I would like to remove that whole piece mentioned above
Upvotes: 1
Views: 164
Reputation: 6486
You can replace everything found by the regex as shown below:
import re
regex = r"(<span.+?>)|(<\/span>)"
test_str = "<span class=\\\"_5mfr\\\"><span class=\\\"_6qdm\\\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\\\"static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/…\\\")'>© Dasamoolam Damu (Troll Malayalam)ഹൗ ക്രൂവൽ<span class=\\\"_5mfr\\\"><span class=\\\"_6qdm\\\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\\\"static.xx.fbcdn.net/images/emoji.php/v9/td7/1/16/…\\\")'></span></span></span></span>"
print(re.sub(regex, '', test_str))
Upvotes: 0
Reputation: 7627
import re
txt = 'Iam a good boy <span>some blahblahblah </span</span and my name is john'
print(re.sub(r'<span>.*</span</span ', '', txt))
Prints:
Iam a good boy and my name is john
to the updated question
import re
txt = """<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")+14 variable strings+</span></span"""
print(re.sub(r'<span [^<>]*?</span>?</span', '', txt))
# prints: <span class="_5mfr">
Upvotes: 2
Reputation: 71610
Use BeautifulSoup
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(string, 'html.parser')
for x in soup.findAll('span'):
x.replace_with('')
print(soup.string)
Upvotes: 1