What is going on with this html5lib script?

Question

Trying to process a very simple html5 script and render it using html5lib

import html5lib

html = '''

    
        Hi
    
    
        
        
    

'''

parser = html5lib.HTMLParser(tree = html5lib.treebuilders.getTreeBuilder("lxml"))
walker = html5lib.treewalkers.getTreeWalker("lxml")
serializer = html5lib.serializer.htmlserializer.HTMLSerializer()

document = parser.parse(html)
stream = walker(document)
theHTML = serializer.render(stream)

print theHTML

The output looks like:

Hi

Yup. It just cuts off mid way. Changing the tree builder from lxml to dom does nothing. Tweaking the HTML changes the output but it's still pretty corrupt.

RanRag · Accepted Answer

So the key seems to be omit_optional_tags=False somehow with that missing it eats the end of the output.

parser = html5lib.HTMLParser(tree = html5lib.treebuilders.getTreeBuilder("lxml"))
document = parser.parse(html)    
walker = html5lib.treewalkers.getTreeWalker("lxml")
stream = walker(document)
s = serializer.htmlserializer.HTMLSerializer(omit_optional_tags=False)
output_generator = s.serialize(stream)
for item in output_generator:
         print item








Hi



















>>>

What is going on with this html5lib script?

Answers (1)

Related Questions