Reputation: 125
My requirement is quite straightforward. I want to use a XML processing library in Python that allows me to query the document to find specific nodes based upon some logic (say attribute value check etc.) and return the exact string of the node as present input XML file without any beautification, pretty printing, implicit/explicit tag closing, attribute ordering etc. The reason for this is once I get the exact string of the node, I can then "grep" the input XML file to locate the line number.
I tried to play with xml.dom.minidom and xml.etree.ElementTree but none of them fit my requirements. To clarify my requirement let me take an example:
<?xml version="1.0" encoding="utf-8"?>
...snip...
<sometag attr2='1' attr1="3" />
...snip...
from xml.dom.minidom import parse
dom = parse(path)
sometags = dom.getElementsByTagName('sometag')
for node in sometags:
if int(node.getAttribute('attr1')) > 0:
print node.toxml()
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
elem = tree.findall('.//sometag')[0]
str_elem = ET.tostring(elem, method='xml') #tried changing method to text, html; no luck
Both the snippets 'generate' the XML string instead of extracting it from the XML input file itself which is why it is not exactly the same string as its in the file which is why if I do a string search in the file, it fails. By exact string I mean, one of the library does implicit tag closing while other does explicit e.g. the resulting output string would be even if the in the XML file it is and one orders the attribute values are alphabetically ordered e.g. and vice versa in case of the other library. Is there a way to get this thing done? Please help.
Upvotes: 0
Views: 241
Reputation: 87124
Using lxml (see also) you can get the line number using the Element.sourceline
attribute. This should please because now you do not need to grep for it in the source XML file.
from lxml import etree
root = etree.parse('file.xml').getroot()
elem = root.findall('.//sometag')[0]
>>> print root.sourceline
2
>>> print elem.sourceline
3
If, for some reason, you can't use lxml
, take a look at http://davejingtian.org/2014/07/04/python-hacking-make-elementtree-support-line-number/
Upvotes: 1