BeautifulSoup Python is stripping my HTML comments

Question

My problem is that I want to leave my HTML comments intact but they are stripped.

content = """
    
        
            Foo
            
        
        
            Here's my content
            
        
    """
soup = BeautifulSoup(content, builder=HTMLParserTreeBuilder())
print soup.body.contents

returns

[u'
', Here's my content, u'
', u' testcomment2 ', u'
']

Is there a flag I can pass to have my comments intact?

EDIT The expected output is exaclty what's in the content variable.

Martijn Pieters · Accepted Answer

The comments are there, but their __repr__ representation doesn't include the prefix and postfixes.

You can call the Comment.output_ready() method to include those:

>>> soup.body.contents[3].output_ready()
u''

or convert a parent to unicode, or call the .prettify() method:

>>> unicode(soup.body)
u'
Here\'s my content

'
>>> print(unicode(soup.body))

Here's my content


>>> print(soup.body.prettify())

 
  Here's my content

Also see the Output formatters documenation.

BeautifulSoup Python is stripping my HTML comments

Answers (1)

Related Questions