A. J. Green
A. J. Green

Reputation: 83

Xpath node-set nesting order selection

Is there an Xpath 1.0 expression that I could use starting at the div[@id='rootTag'] context to select the different nested span descendants based on how deep they are nested? For example could you use something like span[2] to select the second most deeply nested span tag rather than second span child of the same parent element?

<div id='rootTag'>
    <span>Test</span>
    <div>   
       <span>Test</span>
       <span>Test</span>
    </div>
   </div>  
       <span>Test</span>
   </div>
   <div>  
     <div>
       <div>  
           <div>
              <span>Test</span>
           </div>
             <span>Test</span>
        </div>
     </div>
    </div>
</div>

Upvotes: 1

Views: 40

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24930

It's a bit (a lot...) of a hack, but it can be done this way:

Assume your html is like this:

levels = """<div id='rootTag'>
  <span>Level2</span>
  <div>   
    <span>Level3</span>
    <div>
     <span>Level4</span>
    </div>
  </div>
  <div>  
    <span>Level3</span>
  </div>
  <div>  
    <div>
      <div>  
        <div>
          <span>Level6</span>
        </div>
        <span>Level5</span>
      </div>
    </div>
  </div>
</div>"""

We then do this:

#First collect the data:
from lxml import etree #you have to make sure your html is well-formed, or it won't work
root = etree.fromstring(levels)
tree = etree.ElementTree(root)

#collect the paths of all <span> elements
paths = [tree.getpath(e) for e in root.iter('span')]

#determine the nesting level of each <span> element
nests = [e.count('/') for e in paths] #or, alternatively:
#nests = [tree.getpath(e).count('/') for e in root.iter('span')]

From here, we use the nesting level in the nests list to extract the comparable element in the paths list. For example, to get the <span> element with the deepest nesting level:

deepest = nests.index(max(nests))
print(paths[deepest],root.xpath(paths[deepest])[0].text)

Output:

/div/div[3]/div/div/div/span Level6

Or to extract the <span> element with a level 4 nesting:

print(paths[nests.index(4)],root.xpath(paths[nests.index(4)])[0].text)

Output:

/div/div[1]/div/span Level4

Upvotes: 1

Related Questions