aletede91
aletede91

Reputation: 1166

Remove <a> tag from a string by HREF attribute

I have an HTML body, a possible extract:

body = 'Hi what <a href="url_example_1" other-attribute>is</a> your <a href="url_example2" other-attribute>name</a>?....other stuffs'

This could be more much longer with others HTML tags and maybe others <a> too.

I also have one url i want to remove from the body:

url_to_remove = 'url_example_1'

Is there a regex or other way to get this new body removing url_to_remove <a> tag?

My new body should be:

new_body = 'Hi what is your <a href="url_example2" other-attribute>name</a>?....other stuffs'

Upvotes: 1

Views: 151

Answers (1)

shoytov
shoytov

Reputation: 605

Try this:

from bs4 import BeautifulSoup

body = 'HTML code here'
to_delete = 'depricated url'
soup = BeautifulSoup(body)
elements = soup.find_all("a")
for element in elements:
    if element['href'] == to_delete:
        element.replace_with("%s" % element.text)
body = soup

print(body)

Upvotes: 2

Related Questions