Reputation: 2514
My problem is that I want to leave my HTML comments intact but they are stripped.
content = """
<html>
<head>
<title>Foo</title>
<!-- testcomment -->
</head>
<body>
<div id="mycontent">Here's my content</div>
<!-- testcomment2 -->
</body>
</html>"""
soup = BeautifulSoup(content, builder=HTMLParserTreeBuilder())
print soup.body.contents
returns
[u'\n', <div id="mycontent">Here's my content</div>, u'\n', u' testcomment2 ', u'\n']
Is there a flag I can pass to have my comments intact?
EDIT
The expected output is exaclty what's in the content
variable.
Upvotes: 1
Views: 123
Reputation: 1121564
The comments are there, but their __repr__
representation doesn't include the <!--
and -->
prefix and postfixes.
You can call the Comment.output_ready()
method to include those:
>>> soup.body.contents[3].output_ready()
u'<!-- testcomment2 -->'
or convert a parent to unicode, or call the .prettify()
method:
>>> unicode(soup.body)
u'<body>\n<div id="mycontent">Here\'s my content</div>\n<!-- testcomment2 -->\n</body>'
>>> print(unicode(soup.body))
<body>
<div id="mycontent">Here's my content</div>
<!-- testcomment2 -->
</body>
>>> print(soup.body.prettify())
<body>
<div id="mycontent">
Here's my content
</div>
<!-- testcomment2 -->
</body>
Also see the Output formatters documenation.
Upvotes: 1