Angelo
Angelo

Reputation: 787

Get only last part of xpath

I am using lxml in python 2.7 to parse an xml file.

the file looks like this:

...
<LM>sua</LM>
<LM>citt&agrave;</LM>
<LM>e</LM>
<LM>l'</LM>
<LM>alto</LM>
<LM>seggio</LM>:
     </l><l>
<LM>oh</LM>
<LM>felice</LM>
<LM>colui</LM>
<LM>cu'</LM>
<LM>ivi</LM>
<LM>elegge</LM>!.
     </l><l>
<LM> E</LM>
<LM>io</LM>
<LM>a</LM>
<LM>lui</LM>:
...

I am iterating through the tree looking for LM nodes.

for node in [z for z in  tree.iterfind(".//LM")]:
    print tree.getpath(node.getparent())

and I get the following output for each node:

'/TEI.2/text/body/div1/l[480]'

So, in this case this means the the current node LM is under the 480th node L. Is there a way to get this 480 that is note the following ?

In [77]: int(tree.getpath(node.getparent()).split('/')[5][2:].replace(']',''))
Out[77]: 480

I mean an elegant way via xpath.

Upvotes: 0

Views: 290

Answers (1)

Abel
Abel

Reputation: 57159

So, in this case this means the the current node LM is under the 480th node L. Is there a way to get this 480 that is note the following ?

int(tree.getpath(node.getparent()).split('/')[5][2:].replace(']',''))

If I understand you correctly, you merely want the position relative to its parent? You can have the XPath return this last position by doing:

node.find("position()")

In normal XPath 1.0, this means "get the position of the current node relative to its parent". However, it looks like the XPath support of this Python module is severely limited. The expressions supported can only be used to return a node and not a value.

If you can use XSLT in Python, you can get all the positions using the XPath 1.0 syntax //LM/position(). And to get the path as well, you have to do a bit more:

<xsl:template match="/">
    <xsl:apply-templates select="//LM" />
</xsl:template>

<xsl:template match="LM">
    <xsl:text>Position: </xsl:text>
    <xsl:value-of select="position()" />
    <xsl:text>, XPath: </xsl:text>
    <xsl:apply-templates select="ancestor::*" mode="path" />
    <xsl:text>&#xA;</xsl:text>
</xsl:template>

<xsl:template match="*" mode="path">
    <xsl:text>/</xsl:text>
    <xsl:value-of select="name()" />
</xsl:template>

This will output a bunch of lines like:

Position: 4, XPath: /a/b/c
Position: 9, XPath: /a/b/d

Upvotes: 1

Related Questions