McAbra
McAbra

Reputation: 2514

BeautifulSoup Python is stripping my HTML comments

My problem is that I want to leave my HTML comments intact but they are stripped.

content = """
    <html>
        <head>
            <title>Foo</title>
            <!-- testcomment -->
        </head>
        <body>
            <div id="mycontent">Here's my content</div>
            <!-- testcomment2 -->
        </body>
    </html>"""
soup = BeautifulSoup(content, builder=HTMLParserTreeBuilder())
print soup.body.contents

returns

[u'\n', <div id="mycontent">Here's my content</div>, u'\n', u' testcomment2 ', u'\n']

Is there a flag I can pass to have my comments intact?

EDIT The expected output is exaclty what's in the content variable.

Upvotes: 1

Views: 123

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121564

The comments are there, but their __repr__ representation doesn't include the <!-- and --> prefix and postfixes.

You can call the Comment.output_ready() method to include those:

>>> soup.body.contents[3].output_ready()
u'<!-- testcomment2 -->'

or convert a parent to unicode, or call the .prettify() method:

>>> unicode(soup.body)
u'<body>\n<div id="mycontent">Here\'s my content</div>\n<!-- testcomment2 -->\n</body>'
>>> print(unicode(soup.body))
<body>
<div id="mycontent">Here's my content</div>
<!-- testcomment2 -->
</body>
>>> print(soup.body.prettify())
<body>
 <div id="mycontent">
  Here's my content
 </div>
 <!-- testcomment2 -->
</body>

Also see the Output formatters documenation.

Upvotes: 1

Related Questions