Reputation: 1026
I have an XML file with following tree structure.
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>Videos</title>
<link>https://www.example.com/r/videos/</link>
<description>A long description of the video.</description>
<image>...</image>
<atom:link rel="self" href="http://www.example.com/videos/.xml" type="application/rss+xml"/>
<item>
<title>The most used Jazz lick in history.</title>
<link>
http://www.example.com/
</link>
<guid isPermaLink="true">
http://www.example.com/
</guid>
<pubDate>Mon, 07 Sep 2015 14:43:34 +0000</pubDate>
<description>
<table>
<tr>
<td>
<a href="http://www.example.com/">
<img src="http://www.example.com/.jpg" alt="The most used Jazz lick in history." title="The most used Jazz lick in history." />
</a>
</td>
<td> submitted by
<a href="http://www.example.com/"> jcepiano </a>
<br/>
<a href="http://www.youtube.com/">[link]</a>
<a href="http://www.example.com/">
[508 comments]
</a>
</td>
</tr>
</table>
</description>
<media:title>The most used Jazz lick in history.</media:title>
<media:thumbnail url="http://example.jpg"/>
</item>
</channel>
</rss>
Here, the html table
element is embedded inside XML and that's confusing me.
Now I want to pick the text node values for //channel/item/title
and href value for //channel/item/description/table/tr/td[1]/a[1]
(with a text node value = "[link]"
)
Above in 2nd case, I am looking for the value of 2nd a
(with a text node value = "[link]"
), inside 2nd td
inside tr
, table
, description
, item
, channel
.
I am using PHP DOMDocument();
I have been looking for a perfect solution for this for 2 days now, can you please let me know how would this happen?
Also I need to count the total number of items in the feed, right now I am doing like this:
...
$queryResult = $xpathvar->query('//item/title');
$total = 1;
foreach($queryResult as $result){
$total++;
}
echo $title;
And I also need a reference link for XPath query selectors' rules.
Thanks in advance! :)
Upvotes: 3
Views: 1223
Reputation: 1026
I finally could make it work with the code below
$url = "https://www.example.com/r/videos/.xml";
$feed_dom = new domDocument;
$feed_dom->load($url);
$feed_dom->preserveWhiteSpace = false;
$items = $feed_dom->getElementsByTagName('item');
foreach($items as $item){
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$desc_table = $item->getElementsByTagName('description')->item(0)->nodeValue;
echo $title . "<br>";
$table_dom = new domDocument;
$table_dom->loadHTML($desc_table);
$xpath = new DOMXpath($table_dom);
$table_dom->preserveWhiteSpace = false;
$yt_link_node = $xpath->query("//table/tr/td[2]/a[2]");
foreach($yt_link_node as $yt_link){
$yt = $yt_link->getAttribute('href');
echo $yt . "<br>";
echo "<br>";
}
}
I thank Abel, your help was greatly useful to achieve the tasks. :)
Upvotes: 0
Reputation: 57169
You wrote that you wanted the length of the result set of the following query:
$queryResult = $xpathvar->query('//item/title');
I assume that $xpathvar
here is of type DOMXPath
. If so, it has a length property as described here. Instead of using foreach
, simply use:
$length = $xpathvar->query('//item/title')->length;
Now I want to pick the text node values for
//channel/item/title
Which you can get with the expression //channel/item/title/text()
.
and href value for
//channel/item/description/table/tr/td[1]/a[1]
(with a text nodevalue = "[link]"
)
Your expression here selects any tr
, the first td
under that, then the first a
. But the first a
does not have a value of "[link]"
in your source. If you want that, though, you can use:
//channel/item/description/table/tr/td[1]/a[1]/@href
but it looks like you rather want:
//channel/item/description/table/tr/td/a[. = "[link]"][1]/@href
which finds the first a
element in the tree that has the value (text node) that is "[link]"
.
Above in 2nd case, I am looking for the value of 2nd
a
(with a text nodevalue = "[link]"
), inside 2ndtd
insidetr
,table
,description
,item
,channel
.
Not sure if this was a separate question or meant to explain the previous one. Regardless, the answer the same as in the previous one, unless you explicitly want to search for 2nd a
etc (i.e., search by position), in which case you can use numeric predicates.
Note: you start most of your expressions with //expr
, which essentially means: search the whole tree at any depth for the expression expr
. This is potentially expensive and if all you need is a (relative) root node for which you know the starting point or expression, it is better, and far more performant, to use a direct path. In your case, you can replace //channel
for /*/channel
(because it is the first under the root element).
Upvotes: 1