Reputation: 2132
Imagine the following XML:
<?xml version="1.0" encoding="utf-8" ?>
<feed>
<title type="text">This is my title</title>
<id>123456</id>
<content>Hello World</content>
</feed>
Let's say we want to access the <id>
value as a string. One would think that could be accessed with:
$xml = simplexml_load_file('file.xml');
print_r($xml->id);
But that's not right, we'll end up just printing a new SimpleXMLElement, like so:
SimpleXMLElement Object
(
[0] => 123456
)
So we get back a new object of which 0 is a property, I guess? There's two way that seem natural to access this, neither of which work:
//throws an error
$xml = simplexml_load_file('file.xml');
print_r($xml->id->0);
//prints "SimpleXMLElement Object ( [0] => 123456 )"
$xml = simplexml_load_file('file.xml');
print_r($xml->id[0]);
So that leads to question A: just what is inside of $xml->id
? It kind of acts like an object, but it also kind of acts like an array. Ultimately, there's two ways to access this value:
//prints '123456'
$xml = simplexml_load_file('file.xml');
$id = (array) $xml->id;
print_r($id[0]);
//prints '123456'
$xml = simplexml_load_file('file.xml');
print_r($xml->id->__toString());
Of these, the second feels more "right" to me, but I'm left wondering just what is going on here. Question B: Why are $xml->id
and $xml->id[0]
identical? For that matter, why are $xml->id[0]
and $xml->id[0][0][0][0][0][0]
also identical?
Imagine the following XML
<?xml version="1.0" encoding="utf-8" ?>
<feed>
<title type="text">This is my title</title>
<tag>news</tag>
<tag>sports</tag>
<content>Hello World</content>
</feed>
Suppose you want to get a list of all tags. This is where I start to get really confused.
$xml = simplexml_load_file('file.xml');
print_r($xml->tag);
This has the following result:
SimpleXMLElement Object
(
[0] => news
)
That's sensible enough, but this is the part I don't get. We can also do this:
$xml = simplexml_load_file('file.xml');
print_r($xml->tag[1]);
Which prints out this:
SimpleXMLElement Object
(
[0] => sports
)
What the hell? If both tags are available inside $xml->tag
then, Question C: why doesn't print_r($xml->tag)
print the following:
SimpleXMLElement Object
(
[0] => news
[1] => sports
)
I guess $xml->tag
implies $xml->tag[0]
? Ultimately, the only way I can see to access a list of all the <tags>
is with xpath:
$xml = simplexml_load_file('file.xml');
$tags = $xml->xpath('//tag');
//$tags is now an array of objects. We want an array of strings.
foreach ($tags as &$tag) {
$tag = (string) $tag;
}
print_r($tags);
Which outputs:
Array
(
[0] => news
[1] => sports
)
But that honestly seems like a lot of code to do something pretty simple and common. So Question D: is there a better way to get a list of values from XML natively in PHP?
Upvotes: 2
Views: 140
Reputation: 197795
Problem 1: Accessing innerXHTML as a string
You access the inner-XML as a string for any SimpleXMLElement by casting it to string:
print_r((string) $xml->id); # gives 123456
So, how does this work? This works because in PHP you can program any object that it can be casted into a string by making use of the __toString()
magic method. SimpleXMLElement is an internal object that does the same.
And why does the print_r($xml->id)
look so strange? Well that is because print_r
and var_dump
on SimpleXMLElement\s are liars. So do not rely to them too much. SimpleXMLElement can lie here btw, because it is an internal object. It can deny the rules we couldn't when we write our own objects in PHP userspace.
question A: just what is inside of $xml->id?
That is just an SimpleXMLElement. And it acts like an object that has implemented ArrayAccess. So you can write objects that can be accessed like arrays. SimpleXMLElement does this as well.
It also override the standard casting to array. The exact rules that SimpleXMLElement will follow when cast to an array are somewhat not-so-intuitive (the best listing I did so far is with SimpleXML and JSON Encode in PHP – Part I + II as the rules are the same as with JSON encoding, only if you're interested because you normally don't need that level of detail).
Question B: Why are $xml->id and $xml->id[0] identical?
This is because $xml->id
is an alias to the first <id>
element, which is also accessible by it's numeric index: $xml->id[0]
. This btw. allows you to access the element itself even it's in a single variable:
$id = $xml->id;
# change inner text
$id[0] = 'hello'; // $id = 'hello'; would have turned $id into a string
# remove the node from the tree
unset($id[0]); // unset($id); would have unset the $id variable only
The $id[0]
or $id->{0}
notation is also sometimes called simplexml self-reference. A longer answer about it with some more references is: https://stackoverflow.com/a/16062633/367456 .
Btw, that is not identical. It's just two ways to access the same XML node in the document.
And for that matter: $xml->id->{0}
would work, too. As would $xml->id[0]->{0}
and even $xml->id->{0}[0][0]->{0}[0]->{0}[0][0]->{0}[0]->{0}[0][0]->{0}[0]
and so on and so forth.
Problem 2: Dealing with multiple nodes of the same type
Question C: why doesn't print_r($xml->tag) print the following:
That is because due to the simplification SimpleXML does, it can't do both, so it needs to do a decision. Normally with $xml->tag
you want to access the first element named <tag>
and not all tags. However by casting you can give SimpleXML a hint what you want:
By casting to string, you basically say: give me the first elements value.
(string) $xml->tag; # news
By casting to array, you say: give me all elements values:
(array) $xml->tag # Array([0] => news, [1] => sports)
Which is perhaps already what you're asking for in
Question D: is there a better way to get a list of values from XML natively in PHP?
That highly depends on your needs. As you have realized, the "simple" in SimpleXML comes with a lot of magic and is not always straight forward to understand. It's a condensed interface for some kind of typical XML parsing needs, but it can not cover all cases distinctively.
The DOM sister library allows you access more detailed with a DOMDocument based API which normally allows more fine-grained control if you need it.
Upvotes: 3