Reputation: 1166
I have an HTML body, a possible extract:
body = 'Hi what <a href="url_example_1" other-attribute>is</a> your <a href="url_example2" other-attribute>name</a>?....other stuffs'
This could be more much longer with others HTML tags and maybe others <a>
too.
I also have one url i want to remove from the body:
url_to_remove = 'url_example_1'
Is there a regex or other way to get this new body removing url_to_remove
<a>
tag?
My new body should be:
new_body = 'Hi what is your <a href="url_example2" other-attribute>name</a>?....other stuffs'
Upvotes: 1
Views: 151
Reputation: 605
Try this:
from bs4 import BeautifulSoup
body = 'HTML code here'
to_delete = 'depricated url'
soup = BeautifulSoup(body)
elements = soup.find_all("a")
for element in elements:
if element['href'] == to_delete:
element.replace_with("%s" % element.text)
body = soup
print(body)
Upvotes: 2