Reputation: 23154
Let's say we have a html doc like this:
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
<p>
passage in division
<b>bold in passage </b>
</p>
</div>
</body>
</html>
I need to prepend a word "cool " to each string (or NavigableString in bs4 terminology) in the html doc.
I have tried to walk through every element and check if it has any children, if not then edit the string. This is inaccurate, besides, the editing didn't take any effect.
Upvotes: 4
Views: 1078
Reputation: 474271
You can find all text nodes in the document by calling find_all()
with text=True
argument. Use replace_with()
to replace the text nodes with the modified text:
from bs4 import BeautifulSoup
html = """
<html>
<head>
<title>title</title>
</head>
<body>
<div class="c1">division
<p>
passage in division
<b>bold in passage </b>
</p>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html)
for element in soup.find_all(text=True):
text = element.string.strip()
if text:
element.replace_with("cool " + text)
print soup.prettify()
Prints:
<html>
<head>
<title>
cool title
</title>
</head>
<body>
<div class="c1">
cool division
<p>
cool passage in division
<b>
cool bold in passage
</b>
</p>
</div>
</body>
</html>
Upvotes: 5