Reputation: 21914
What is a good way to replace an HTML tag like:
Old : <div id=pgbrk" ....../>....Page Break....</div>
New : <!--page break -->
div
id might have many other values hence regex is not a good idea. I need some LXML kind of thing. Basically, my problem is to replace an HTML tag with a string!
Upvotes: 3
Views: 1845
Reputation: 16037
You can use plain DOM http://docs.python.org/library/xml.dom.minidom.html
1) parse your source
from xml.dom.minidom import parse
datasource = open('c:\\temp\\mydata.xml')
doc= parse(datasource)
2) find your nodes to remove
for node in doc.getElementsByTagName('div'):
for attr in node.attributes:
if attr.name == 'id':
...
3) when found targeted nodes, replace them with new comment node
parent = node.parentNode
parent.replaceChild(doc.createComment("page break"), node)
docs: http://docs.python.org/library/xml.dom.html
Upvotes: 2
Reputation: 879739
As long as your div
has a parent tag, you could do this:
import lxml.html as LH
import lxml.etree as ET
content='<root><div id="pgbrk" ......>....Page Break....</div></root>'
doc=LH.fromstring(content)
# print(LH.tostring(doc))
for div in doc.xpath('//div[@id="pgbrk"]'):
parent=div.getparent()
parent.replace(div,ET.Comment("page break"))
print(LH.tostring(doc))
yields
<root><!--page break--></root>
Upvotes: 3