Xpath query for HTML table within XML in PHP DOMDocument

Question

I have an XML file with following tree structure.



    Videos
    https://www.example.com/r/videos/
    A long description of the video.
    ...
    
    
        The most used Jazz lick in history.
        
        http://www.example.com/
        
        
         http://www.example.com/
        
    Mon, 07 Sep 2015 14:43:34 +0000
    
    
        
            
                
                    
                
            
             submitted by 
                 jcepiano 
                

                [link]
                
                    [508 comments]
                
            
        
    
    
    The most used Jazz lick in history.

Here, the html table element is embedded inside XML and that's confusing me.

Now I want to pick the text node values for //channel/item/title and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

I am using PHP DOMDocument();

I have been looking for a perfect solution for this for 2 days now, can you please let me know how would this happen?

Also I need to count the total number of items in the feed, right now I am doing like this:

...
$queryResult = $xpathvar->query('//item/title');
$total = 1;
foreach($queryResult as $result){
           $total++;
}
echo $title;

And I also need a reference link for XPath query selectors' rules.

Thanks in advance! :)

Abel · Accepted Answer

You wrote that you wanted the length of the result set of the following query:

$queryResult = $xpathvar->query('//item/title');

I assume that $xpathvar here is of type DOMXPath. If so, it has a length property as described here. Instead of using foreach, simply use:

$length = $xpathvar->query('//item/title')->length;

Now I want to pick the text node values for //channel/item/title

Which you can get with the expression //channel/item/title/text().

and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

Your expression here selects any tr, the first td under that, then the first a. But the first a does not have a value of "[link]" in your source. If you want that, though, you can use:

//channel/item/description/table/tr/td[1]/a[1]/@href

but it looks like you rather want:

//channel/item/description/table/tr/td/a[. = "[link]"][1]/@href

which finds the first a element in the tree that has the value (text node) that is "[link]".

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

Not sure if this was a separate question or meant to explain the previous one. Regardless, the answer the same as in the previous one, unless you explicitly want to search for 2nd a etc (i.e., search by position), in which case you can use numeric predicates.

Note: you start most of your expressions with //expr, which essentially means: search the whole tree at any depth for the expression expr. This is potentially expensive and if all you need is a (relative) root node for which you know the starting point or expression, it is better, and far more performant, to use a direct path. In your case, you can replace //channel for /*/channel (because it is the first under the root element).

Xpath query for HTML table within XML in PHP DOMDocument

Answers (2)

Related Questions