Nishant
Nishant

Reputation: 21914

How do I replace a HTML element with some new format in Python

What is a good way to replace an HTML tag like:

Old : <div id=pgbrk" ....../>....Page Break....</div>

New : <!--page break -->

div id might have many other values hence regex is not a good idea. I need some LXML kind of thing. Basically, my problem is to replace an HTML tag with a string!

Upvotes: 3

Views: 1845

Answers (2)

bpgergo
bpgergo

Reputation: 16037

You can use plain DOM http://docs.python.org/library/xml.dom.minidom.html

1) parse your source

from xml.dom.minidom import parse
datasource = open('c:\\temp\\mydata.xml')
doc= parse(datasource)

2) find your nodes to remove

for node in doc.getElementsByTagName('div'):
    for attr in node.attributes:
        if attr.name == 'id':
            ...

3) when found targeted nodes, replace them with new comment node

parent = node.parentNode
parent.replaceChild(doc.createComment("page break"), node)

docs: http://docs.python.org/library/xml.dom.html

Upvotes: 2

unutbu
unutbu

Reputation: 879739

As long as your div has a parent tag, you could do this:

import lxml.html as LH
import lxml.etree as ET

content='<root><div id="pgbrk" ......>....Page Break....</div></root>'
doc=LH.fromstring(content)
# print(LH.tostring(doc))    
for div in doc.xpath('//div[@id="pgbrk"]'):
    parent=div.getparent()
    parent.replace(div,ET.Comment("page break"))
print(LH.tostring(doc))

yields

<root><!--page break--></root>

Upvotes: 3

Related Questions