Making beautiful soup play nice with handlebars

Question

I am making a script to refactor template-enhanced html.

I'd like beautifulsoup to print verbatim any code inside {{ }} without transforming > or other symbols into html entities.

It needs to split a file with many templates into many files, each with one template.

Spec: splitTemplates templates.html templateDir must:

read templates.html
for each , write this template to file templateDir/xxx

Code snippet:

soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
    if t.name=="template":
        newfname = dirname+"/"+t["name"]+ext
        f = open(newfname,"w")
        # note f.write(t) fails as t is not a string -- prettify works
        f.write(t.prettify())
        f.close()

Undesired behavior:

{{ > directive }} becomes {{ > directive }} but needs to be preserved as {{ > directive }}

Update: f.write(t.prettify(formatter=None)) sometimes preserves the > and sometimes changes it to >. This seems to be the most obvious change, not sure why it changes some > but not others.

Solved: Thanks to Hai Vu and also https://stackoverflow.com/a/663128/103081

    import HTMLParser
    U = HTMLParser.HTMLParser().unescape
    soup = bs4.BeautifulSoup(open(filename,"r"))
    for t in soup.children:
        if t.name=="template":
            newfname = dirname+"/"+t["name"]+ext
            f = open(newfname,"w")
            f.write(U(t.prettify(formatter=None)))
            f.close()

See Also: https://gist.github.com/DrPaulBrewer/9104465

Hai Vu · Accepted Answer

You need to unescape the HTML contents. To do that, use the HTMLParser module. This is not my original idea. I took it from:

How can I change '>' to '>' and '>' to '>'?

Making beautiful soup play nice with handlebars

Answers (1)

Related Questions