Reputation: 249
I am trying to change the content of an html file with BeautifulSoup. This content will be coming from python-based text so it will have \n newlines...
newContent = """This is my content \n with a line break."""
newContent = newContent.replace("\n", "<br>")
htmlFile.find_all("div", "product").p.string = newContent
when I do this, the html file <p>
text is changed to this:
This is my content <br> with a line break.
How do I change a string within a BeautifulSoup object and keep <br>
breaks? if the string just contains \n
then it'll create an actual line break.
Upvotes: 2
Views: 6139
Reputation: 1121942
You need to create separate elements; there isn't one piece of text contained in the <p>
tag, but a series of text and <br/>
elements.
Rather than replace \n
newlines with the text <br/>
(which will be escaped), split the text on newlines and insert extra elements in between:
parent = htmlFile.find_all("div", "product")[0].p
lines = newContent.splitlines()
parent.append(htmlFile.new_string(lines[0]))
for line in lines[1:]:
parent.append(htmlFile.new_tag('br'))
parent.append(htmlFile.new_string(line))
This uses the Element.append()
method to add new elements to the tree, and using BeautifulSoup.new_string()
and BeautifulSoup.new_tag()
to create those extra elements.
Demo:
>>> from bs4 import BeautifulSoup
>>> htmlFile = BeautifulSoup('<p></p>')
>>> newContent = """This is my content \n with a line break."""
>>> parent = htmlFile.p
>>> lines = newContent.splitlines()
>>> parent.append(htmlFile.new_string(lines[0]))
>>> for line in lines[1:]:
... parent.append(htmlFile.new_tag('br'))
... parent.append(htmlFile.new_string(line))
...
>>> print htmlFile.prettify()
<html>
<head>
</head>
<body>
<p>
This is my content
<br/>
with a line break.
</p>
</body>
</html>
Upvotes: 2