Reputation: 27473
I am making a script to refactor template-enhanced html.
I'd like beautifulsoup to print verbatim any code inside {{
}}
without transforming >
or other symbols into html entities.
It needs to split a file with many templates into many files, each with one template.
Spec: splitTemplates templates.html templateDir
must:
<template name="xxx">contents {{ > directive }}</template>
, write this template to file templateDir/xxx
Code snippet:
soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
if t.name=="template":
newfname = dirname+"/"+t["name"]+ext
f = open(newfname,"w")
# note f.write(t) fails as t is not a string -- prettify works
f.write(t.prettify())
f.close()
Undesired behavior:
{{ > directive }}
becomes {{ > directive }}
but needs to be preserved as {{ > directive }}
Update: f.write(t.prettify(formatter=None))
sometimes preserves the >
and sometimes changes it to >
. This seems to be the most obvious change, not sure why it changes some >
but not others.
Solved: Thanks to Hai Vu and also https://stackoverflow.com/a/663128/103081
import HTMLParser
U = HTMLParser.HTMLParser().unescape
soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
if t.name=="template":
newfname = dirname+"/"+t["name"]+ext
f = open(newfname,"w")
f.write(U(t.prettify(formatter=None)))
f.close()
See Also: https://gist.github.com/DrPaulBrewer/9104465
Upvotes: 1
Views: 224
Reputation: 40773
You need to unescape the HTML contents. To do that, use the HTMLParser module. This is not my original idea. I took it from:
How can I change '>' to '>' and '>' to '>'?
Upvotes: 1