Reputation: 23078
I want to parse HTML and turn them into string templates. In the example below, I seeked out elements marked with x-inner
and they became template placeholders in the final string. Also x-attrsite
also became a template placeholder (with a different command of course).
Input:
<div class="x,y,z" x-attrsite>
<div x-inner></div>
<div>
<div x-inner></div>
</div>
</div>
Desired output:
<div class="x,y,z" {attrsite}>{inner}<div>{inner}</div></div>
I know there is HTMLParser and BeautifulSoup, but I am at a loss on how to extract the strings before and after the x-*
markers and to escape those strings for templating.
Existing curly braces are handled sanely, like this sample:
<div x-maybe-highlighted> The template string "there are {n} message{suffix}" can be used.</div>
Upvotes: 1
Views: 1244
Reputation: 473763
BeautifulSoup
can handle the case:
div
elements with x-attrsite
attribute, remove the attribute and add {attrsite}
attribute with a value None
(produces an attribute with no value)div
elements with x-inner
attribute and use replace_with()
to replace the element with a text {inner}
Implementation:
from bs4 import BeautifulSoup
data = """
<div class="x,y,z" x-attrsite>
<div x-inner></div>
<div>
<div x-inner></div>
</div>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for div in soup.find_all('div', {'x-attrsite': True}):
del div['x-attrsite']
div['{attrsite}'] = None
for div in soup.find_all('div', {'x-inner': True}):
div.replace_with('{inner}')
print(soup.prettify())
Prints:
<div class="x,y,z" {attrsite}>
{inner}
<div>
{inner}
</div>
</div>
Upvotes: 2