XPath to look for subtree

Question

I'm scraping an html document, whose structure changes all the time. Css class names even change, so I can't rely on that. However, one thing never changes, the value is always contained in a subtree exactly like the following:


  
    wanted value
    wanted value

Can this be expressed as an XPath expression?

It should not match:


  
     1, one too little 
     2 
     3, one too many 
     4, two too many

I plan to do this using lxml for Python.

Mark Veenstra · Accepted Answer

If the location of the wanted value is always on the third level of span an xpath as follows will work:

//span/span/span[1]

When applied on the next HTML document:


  
    Your Title
  
  
    
    
      
        wanted value
        
      
    
    
    
    
      
        wanted value

The result will be:

wanted value
wanted value

EDIT

If you only want the values of the first span on the third level when the total of spans equals 2 on the third level you can use the following XPath:

//span/span[count(span) = 2]/span[1]

XPath to look for subtree

Answers (1)

Related Questions