Reputation: 3564
How would I go about writing a function (with BeautifulSoup or otherwise) that would replace all instances of one HTML tag with another. For example:
text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text) # "<p>this is some text<p><p class='foo' data-foo='bar'> with some tags</p><span>that I would</span><p>like to replace</p>"
I tried this, but preserving the attributes of each tag is a challenge:
def replace_tags(string, old_tag, new_tag):
soup = BeautifulSoup(string, "html.parser")
nodes = soup.findAll(old_tag)
for node in nodes:
new_content = BeautifulSoup("<{0}>{1}</{0}".format(
new_tag, node.contents[0],
))
node.replaceWith(new_content)
string = soup.body.contents[0]
return string
Any idea how I could just replace the tag element itself in the soup? Or, even better, does anyone know of a library/utility function that'll handle this more robustly than something I'd write?
Thank you!
Upvotes: 3
Views: 1362
Reputation: 7238
Actually it's pretty simple. You can directly use old_tag.name = new_tag
.
def replace_tags(string, old_tag, new_tag):
soup = BeautifulSoup(string, "html.parser")
for node in soup.findAll(old_tag):
node.name = new_tag
return soup # or return str(soup) if you want a string.
text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text)
Output:
<p>this is some text<p><p class="foo" data-foo="bar"> with some tags</p><span>that I would</span><p>like to replace</p></p></p>
From the documentation:
Every tag has a name, accessible as
.name
:tag.name # u'b'
If you change a tag’s name, the change will be reflected in any HTML markup generated by Beautiful Soup:
tag.name = "blockquote" tag # <blockquote class="boldest">Extremely bold</blockquote>
Upvotes: 4