Reputation: 529
I wanted to get xpath of each element in xml file.
xml file:
<root
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">
<table>
<tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</tr>
</table>
</root>
python code: Since null prefix in default namespace is not allowed,i used my own prefix for that.
from lxml import etree
root=etree.parse(open("MyData.xml",'r'))
ns={'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
b=root.getpath(e)
print b
r=root.xpath(b,namespaces=ns)
#i need both b and r here
the xpath is like this(output b)
/*
/*/*[1]
/*/*[1]/*[1]
/*/*[1]/*[1]/h:td
i can't get the xpath correctly for elements having default namespace,it shows as * for those elements name. How to get xpath correctly?
Upvotes: 4
Views: 5724
Reputation: 16105
You could use getelementpath
, which always returns the elements in Clark notation, and replace the namespaces manually:
x = """
<root
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">
<table>
<tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</tr>
</table>
</root>
"""
from lxml import etree
root = etree.fromstring(x).getroottree()
ns = {'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
path = root.getelementpath(e)
root_path = '/' + root.getroot().tag
if path == '.':
path = root_path
else:
path = root_path + '/' + path
for ns_key in ns:
path = path.replace('{' + ns[ns_key] + '}', ns_key + ':')
print(path)
r = root.xpath(path, namespaces=ns)
print(r)
Obviously, this example shows that getelementpath
returns paths relative to the root node, like .
and dt:table
instead of /df:root
and /df:root/df:table
, so we use the tag
of the root element to manually construct the full path.
Output:
/df:root
[<Element {http://www.w3.org/TR/html4/}root at 0x37f5348>]
/df:root/df:table
[<Element {http://www.w3.org/TR/html4/}table at 0x44bdb88>]
/df:root/df:table/df:tr
[<Element {http://www.w3.org/TR/html4/}tr at 0x37fa7c8>]
/df:root/df:table/df:tr/types:td[1]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdac8>]
/df:root/df:table/df:tr/types:td[2]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdb88>]
Upvotes: 3