Irene T.
Irene T.

Reputation: 1393

Parsing specific line from xml with same name

I have an xml with 10 records and the structure is:

<entry>
<title>My Title</title>
<link rel="alternate" type="text/html" href="http://myweb.com/posts/one.html"/>
<published>2014-07-07T00:34:00+00:00</published>
<updated>2014-07-07T00:34:00+00:00</updated>
<id>http://myweb.com/posts/one.html</id>
<author>
<name>Myweb.com</name>
</author>
<content>
Some Content Here
</content>
<link rel="enclosure" href="http://myweb.com/uploads/300px-300px.jpg" type="image/jpeg" length=""/>
</entry>

I am using the code bellow to parse it and its almost working great except that i can't fetch the image url that is in the duplicate line:

 <link rel="enclosure" href="http://myweb.com/uploads/300px-300px.jpg" type="image/jpeg" length=""/>

My code is:

$url = "http://myweb.com/posts.xml";
$xml = simplexml_load_file($url);
foreach($xml->entry as $PRODUCT) {

$my_title = trim($PRODUCT->title);
$url = trim($PRODUCT->id);
$im = (string)$PRODUCT->xPath('//link[@rel="enclosure"]');

echo $my_title . " " . $url . " " . $im;
echo "<br>";

}

This: $im = (string)$PRODUCT->xPath('//link[@rel="enclosure"]'); Returns "Array" and not the url inisde href.

Thanks

Upvotes: 0

Views: 104

Answers (2)

IMSoP
IMSoP

Reputation: 97783

This: $im = (string)$PRODUCT->xPath('//link[@rel="enclosure"]'); Returns "Array" and not the url inisde href.

Whenever you see a string containing the word "Array" in PHP, where you were expecting something else, you need to think "hm, I seem to have cast an array to a string, how did that happen?" (Similarly, if you unexpectedly see the string "A", consider the possibility that it's a one-letter substring of "Array").

In this case, the reason why is quite simple: if you look up the manual page for the SimpleXMLElement::xpath() method, you'll see that it returns an array unless there is an error (not finding a match is not an error, and will give you an empty array).

The only reason this is surprising, is that most methods on that class return another instance of the same class, with magic overloads for things like the (string) cast. However, all of those objects represent a more-or-less coherent fragment of the XML document (e.g. 1 or more consecutive nodes, or siblings filtered by a particular tag-name), and can never represent "nothing". An XPath result could be empty, or contain nodes of various types from all over the document; I don't know for sure, but I suspect this is why an array return was chosen here rather than another variety of SimpleXMLElement object.

So $PRODUCT->xPath('//link[@rel="enclosure"]')[0] will give you the first result (or $xpath_results = $PRODUCT->xPath('//link[@rel="enclosure"]'); $im = $xpath_results[0] if you can't rely on at least PHP 5.4, or want to insert a check in between for no nodes being matched).

There are a few extra catches here, though:

  • Namespaces: as ThW points out, Atom feeds often have an XML namespace declaration, and you need to handle this in your XPath query by registering a prefix, e.g. $product->registerXpathNamespace('atom', 'http://www.w3.org/2005/Atom'); and then use it in your XPath expression (e.g. //atom:link rather than //link).
  • You didn't specify that you wanted the href attribute: either change your XPath expression to select it (//link[@rel="enclosure"]/@href) or change your access to grab it from the SimpleXMLElement returned ($xpath_results[0]['href']).

Stick it all together (and get rid of that ugly and unusual all-caps variable name), and the compact version (no error checking, minimum readability) would be either:

$product->registerXpathNamespace('atom', 'http://www.w3.org/2005/Atom');
(string)$product->xPath('//atom:link[@rel="enclosure"]')[0]['href']

or

$product->registerXpathNamespace('atom', 'http://www.w3.org/2005/Atom');
(string)$product->xPath('//atom:link[@rel="enclosure"]/@href')[0]

Upvotes: 2

ThW
ThW

Reputation: 19502

That looks like it is part of an Atom feed. This means it has a namespace. To use Xpath on an XML with namespaces, you have to register an alias/prefix the namespace. This is a little complex with SimpleXML, you have to do it on each element, you're calling the xpath() method and it will always return an array of SimpleXMLElement objects.

$feed = simplexml_load_string($xml);

foreach($feed->entry as $product) {
  $product->registerXpathNamespace('atom', 'http://www.w3.org/2005/Atom');
  var_dump((string)$product->xpath('//atom:link[@rel="enclosure"]')[0]['href']);
}

Demo: https://eval.in/170439

With DOMXpath this is more easier, the namespaces only need to be registered on the DOMXpath object once and DOMXpath::evaluate() can return scalar values. The second argument is the context for the Xpath expression:

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('atom', 'http://www.w3.org/2005/Atom');

foreach($xpath->evaluate('//atom:entry') as $product) {
  var_dump($xpath->evaluate('string(atom:link[@rel="enclosure"]/@href)', $product));
}

Demo: https://eval.in/170444

Upvotes: 1

Related Questions