qed
qed

Reputation: 23154

Edit all strings in a html doc with beautifulsoup 4 in python

Let's say we have a html doc like this:

<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>

I need to prepend a word "cool " to each string (or NavigableString in bs4 terminology) in the html doc.

I have tried to walk through every element and check if it has any children, if not then edit the string. This is inaccurate, besides, the editing didn't take any effect.

Upvotes: 4

Views: 1078

Answers (1)

alecxe
alecxe

Reputation: 474271

You can find all text nodes in the document by calling find_all() with text=True argument. Use replace_with() to replace the text nodes with the modified text:

from bs4 import BeautifulSoup

html = """
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
    <p>
    passage in division
    <b>bold in passage </b>
    </p>
</div>
</body>
</html>
"""

soup = BeautifulSoup(html)
for element in soup.find_all(text=True):
    text = element.string.strip()
    if text:
        element.replace_with("cool " + text)

print soup.prettify()

Prints:

<html>
 <head>
  <title>
   cool title
  </title>
 </head>
 <body>
  <div class="c1">
   cool division
   <p>
    cool passage in division
    <b>
     cool bold in passage
    </b>
   </p>
  </div>
 </body>
</html>

Upvotes: 5

Related Questions