Robert Townley
Robert Townley

Reputation: 3564

Python function to replace all instances of a tag with another

How would I go about writing a function (with BeautifulSoup or otherwise) that would replace all instances of one HTML tag with another. For example:

text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text)  # "<p>this is some text<p><p class='foo' data-foo='bar'> with some tags</p><span>that I would</span><p>like to replace</p>"

I tried this, but preserving the attributes of each tag is a challenge:

def replace_tags(string, old_tag, new_tag):
  soup = BeautifulSoup(string, "html.parser")
  nodes = soup.findAll(old_tag)
  for node in nodes:
      new_content = BeautifulSoup("<{0}>{1}</{0}".format(
          new_tag, node.contents[0],
      ))  
      node.replaceWith(new_content)                                                
  string = soup.body.contents[0]
  return string

Any idea how I could just replace the tag element itself in the soup? Or, even better, does anyone know of a library/utility function that'll handle this more robustly than something I'd write?

Thank you!

Upvotes: 3

Views: 1362

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7238

Actually it's pretty simple. You can directly use old_tag.name = new_tag.

def replace_tags(string, old_tag, new_tag):
    soup = BeautifulSoup(string, "html.parser")
    for node in soup.findAll(old_tag):
        node.name = new_tag
    return soup  # or return str(soup) if you want a string.

text = "<p>this is some text<p><bad class='foo' data-foo='bar'> with some tags</bad><span>that I would</span><bad>like to replace</bad>"
new_text = replace_tags(text, "bad", "p")
print(new_text)

Output:

<p>this is some text<p><p class="foo" data-foo="bar"> with some tags</p><span>that I would</span><p>like to replace</p></p></p>

From the documentation:

Every tag has a name, accessible as .name:

tag.name
# u'b' 

If you change a tag’s name, the change will be reflected in any HTML markup generated by Beautiful Soup:

tag.name = "blockquote" 
tag
# <blockquote class="boldest">Extremely bold</blockquote>

Upvotes: 4

Related Questions