Reputation: 540

Correct XPath yields empty result

I'm trying to select a node from a html page based on the id of the node. Due to external restrictions I have to do that using XPath.

I want to get the container element of the postings of a forum, in this case of Delphi-PRAXiS. I have attached a simple example of the page.

The node I need is a div with the id "posts", so my query would be //div[@id='posts']. The problem is, the result is an empty list. If I query using //*[@id='posts'] I get my node.

I tried this using the XmlDocument class of the framework.

Eventually I want to use the Html Agility Pack (wich uses the same XPath class as the XmlDocument) but if I use that I get no results regardless of the query string.

I know the query string is correct, so my guess would be that the parser is faulty. But somehow I doubt Microsoft would ship a broken XPath parser.

Any suggestions?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="de">  
  <head>    
    <title>Some title</title>  
  </head>  

  <body>    
    <div>          
      <div class="page">                 
        <div id="dp-page" class="round-all">          
          <div class="dpbox">               
            <div id="posts">
              Here we go!               
            </div>            
          </div>            
        </div>          
      </div>        
    </div>  
  </body>
</html>

I found another clue: If the node <a name="poststop" id="poststop"></a> is present in the xml the query fails, otherwise it succeeds. But why?

Upvotes: 2

Answers (2)

Charleh

Reputation: 14002

Though I don't recommend it, you can also load the document without namespaces using XmlTextReader

// Create XML data element
xmlData = new XmlDocument();

// Read using XmlTextReader to strip namespaces
using (XmlTextReader tr = new XmlTextReader(sourceFile))
{
    tr.Namespaces = false;
    xmlData.Load(tr);
}

I use this for some doc processing I do to ensure that I don't need to worry about namespaces when I'm searching for fields using database config data.

Upvotes: 0

Tim Rogers

Reputation: 21723

XHTML elements are in the http://www.w3.org/1999/xhtml namespace so you need to specify that in your selector. Your code should look something like this (using XDocument is a bit easier where namespaces are concerned).

var nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
var nodelist = doc.SelectNodes("//xhtml:div[@id='posts']", nsmgr);

Upvotes: 3

Correct XPath yields empty result

Answers (2)

Related Questions