Paul
Paul

Reputation: 27473

Making beautiful soup play nice with handlebars

I am making a script to refactor template-enhanced html.

I'd like beautifulsoup to print verbatim any code inside {{ }} without transforming > or other symbols into html entities.

It needs to split a file with many templates into many files, each with one template.

Spec: splitTemplates templates.html templateDir must:

  1. read templates.html
  2. for each <template name="xxx">contents {{ > directive }}</template>, write this template to file templateDir/xxx

Code snippet:

soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
    if t.name=="template":
        newfname = dirname+"/"+t["name"]+ext
        f = open(newfname,"w")
        # note f.write(t) fails as t is not a string -- prettify works
        f.write(t.prettify())
        f.close()

Undesired behavior:

{{ > directive }} becomes {{ &gt; directive }} but needs to be preserved as {{ > directive }}

Update: f.write(t.prettify(formatter=None)) sometimes preserves the > and sometimes changes it to &gt;. This seems to be the most obvious change, not sure why it changes some > but not others.

Solved: Thanks to Hai Vu and also https://stackoverflow.com/a/663128/103081

    import HTMLParser
    U = HTMLParser.HTMLParser().unescape
    soup = bs4.BeautifulSoup(open(filename,"r"))
    for t in soup.children:
        if t.name=="template":
            newfname = dirname+"/"+t["name"]+ext
            f = open(newfname,"w")
            f.write(U(t.prettify(formatter=None)))
            f.close()

See Also: https://gist.github.com/DrPaulBrewer/9104465

Upvotes: 1

Views: 224

Answers (1)

Hai Vu
Hai Vu

Reputation: 40773

You need to unescape the HTML contents. To do that, use the HTMLParser module. This is not my original idea. I took it from:

How can I change '>' to '&gt;' and '&gt;' to '>'?

Upvotes: 1

Related Questions