james_dean
james_dean

Reputation: 1517

Getting the Path of an XML Node in Python

I am often working with quite complex documents that I am effectively reverse engineering. I need to write code to modify the value of certain nodes with the path traversed to reach the node acting as a condition.

When examining the XML data to understand it's structure, it would be very useful to have a representation on of this path. I can then use the path in the code to get to that specific node.

For example, I can get the representation Catalog > Book > Textbook > Author from the document below.

Are there any libraries in Python that allows me to do this?

<xml>
    <Catalog>
        <Book>
            <Textbook>
                <Author ="James" />
            </Textbook>
        </Book>
        <Journal>
            <Science>
                <Author ="James" />
            </Science>
        </Journal>
    </Catalog>
</xml>

Upvotes: 5

Views: 7952

Answers (1)

Pedro Romano
Pedro Romano

Reputation: 11203

The lxml library has a etree.Element.getpath method (search for Generating XPath expressions in the previous link) "which returns a structural, absolute XPath expression to find that element". Here's an example from the lxml library documentation:

>>> a  = etree.Element("a")
>>> b  = etree.SubElement(a, "b")
>>> c  = etree.SubElement(a, "c")
>>> d1 = etree.SubElement(c, "d")
>>> d2 = etree.SubElement(c, "d")

>>> tree = etree.ElementTree(c)
>>> print(tree.getpath(d2))
/c/d[2]
>>> tree.xpath(tree.getpath(d2)) == [d2]
True

Upvotes: 6

Related Questions