houallet
houallet

Reputation: 249

Modifying a BeautifulSoup .string with line breaks

I am trying to change the content of an html file with BeautifulSoup. This content will be coming from python-based text so it will have \n newlines...

newContent = """This is my content \n with a line break."""
newContent = newContent.replace("\n", "<br>")
htmlFile.find_all("div", "product").p.string = newContent

when I do this, the html file <p> text is changed to this:

This is my content &lt;br&gt; with a line break.

How do I change a string within a BeautifulSoup object and keep <br> breaks? if the string just contains \n then it'll create an actual line break.

Upvotes: 2

Views: 6139

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121942

You need to create separate elements; there isn't one piece of text contained in the <p> tag, but a series of text and <br/> elements.

Rather than replace \n newlines with the text <br/> (which will be escaped), split the text on newlines and insert extra elements in between:

parent = htmlFile.find_all("div", "product")[0].p
lines = newContent.splitlines()
parent.append(htmlFile.new_string(lines[0]))
for line in lines[1:]:
    parent.append(htmlFile.new_tag('br'))
    parent.append(htmlFile.new_string(line))

This uses the Element.append() method to add new elements to the tree, and using BeautifulSoup.new_string() and BeautifulSoup.new_tag() to create those extra elements.

Demo:

>>> from bs4 import BeautifulSoup
>>> htmlFile = BeautifulSoup('<p></p>')
>>> newContent = """This is my content \n with a line break."""
>>> parent = htmlFile.p
>>> lines = newContent.splitlines()
>>> parent.append(htmlFile.new_string(lines[0]))
>>> for line in lines[1:]:
...     parent.append(htmlFile.new_tag('br'))
...     parent.append(htmlFile.new_string(line))
... 
>>> print htmlFile.prettify()
<html>
 <head>
 </head>
 <body>
  <p>
   This is my content
   <br/>
   with a line break.
  </p>
 </body>
</html>

Upvotes: 2

Related Questions