Reputation: 1916
all,
I have a inputXml.xml file as below:
<content>
<item name="book" label="Book">
<![CDATA[ book name ]]>
</item>
<item name="price" label="Price">
<![CDATA[ 35 ]]>
</item>
</content>
And when I use code as below to parse the xml file:
$obj = simplexml_load_string(file_get_contents($inputXml),'SimpleXMLElement', LIBXML_NOCDATA);
$json = json_encode($obj);
$inputArray = json_decode($json,TRUE);
I get the array like below:
[content] => Array
(
[item] => Array
(
[0] => book name
[1] => 35
)
)
I am wondering, is it possible to get an associative array by using the value of the attributes "name" or "label" as the key as below:
[content] => Array
(
[item] => Array
(
[name] => book name
[price] => 35
)
)
Upvotes: 1
Views: 5129
Reputation: 197767
First of all you've been fooled by some other code that you would need to json_encode
and json_decode
to the get the array out of SimpleXMLElement. Instead, you only need to cast to array:
$inputArray = (array) $obj;
Then you've got the problem that the array-serialization you're looking for is not the default serialization that the SimpleXMLElement provides with that XML.
Additionally another minor problem you have is the dependency on using LIBXML_NOCDATA
because otherwise you wouldn't get the format to come near. But not depending on that flag (and therefore on the point if the underlying XML would use CDATA or not for element value XML-encoding) would be useful, too, to gain a certain stability of the code.
As SimpleXMLElement does not provide your wanted behavior you have normally two options here: Extend from SimpleXMLElement or decorate it. I normally suggest decoration as extension is limited. E.g. you can not interfere via extension with the (array)
casting, you can however for JSON serialization. But that's not what you're looking for, you're looking for array serialization.
So for a kind-of-standard array serialization of a SimpleXMLElement you could implement this with a serializer and a strategy object on how to array-serialize a specific element.
This first needs the serializer:
interface ArraySerializer
{
public function arraySerialize();
}
class SimpleXMLArraySerializer implements ArraySerializer
{
/**
* @var SimpleXMLElement
*/
private $subject;
/**
* @var SimpleXMLArraySerializeStrategy
*/
private $strategy;
public function __construct(SimpleXMLElement $element, SimpleXMLArraySerializeStrategy $strategy = NULL) {
$this->subject = $element;
$this->strategy = $strategy ?: new DefaultSimpleXMLArraySerializeStrategy();
}
public function arraySerialize() {
$strategy = $this->getStrategy();
return $strategy->serialize($this->subject);
}
/**
* @return SimpleXMLArraySerializeStrategy
*/
public function getStrategy() {
return $this->strategy;
}
}
This array-serializer is yet missing the functionality to serialize. This has been directed to a strategy so that it can be easily exchanged later on. Here is a default strategy to do so:
abstract class SimpleXMLArraySerializeStrategy
{
abstract public function serialize(SimpleXMLElement $element);
}
class DefaultSimpleXMLArraySerializeStrategy extends SimpleXMLArraySerializeStrategy
{
public function serialize(SimpleXMLElement $element) {
$array = array();
// create array of child elements if any. group on duplicate names as an array.
foreach ($element as $name => $child) {
if (isset($array[$name])) {
if (!is_array($array[$name])) {
$array[$name] = [$array[$name]];
}
$array[$name][] = $this->serialize($child);
} else {
$array[$name] = $this->serialize($child);
}
}
// handle SimpleXMLElement text values.
if (!$array) {
$array = (string)$element;
}
// return empty elements as NULL (self-closing or empty tags)
if (!$array) {
$array = NULL;
}
return $array;
}
}
This object contains a common way to convert a SimpleXMLElement into an array. It behaves comparable to what your XML as SimpleXMLElement with LIBXML_NOCDATA
already does. However it does not have the problem with CDATA. To show this, the following example already gives the output you have:
$obj = new SimpleXMLElement($xml);
$serializer = new SimpleXMLArraySerializer($obj);
print_r($serializer->arraySerialize());
Now as so far the array serialization has been implemented in types of it's own, it's easy to change it according to the needs. For the content element you have a different strategy to turn it into an array. It is also far easier:
class ContentXMLArraySerializeStrategy extends SimpleXMLArraySerializeStrategy
{
public function serialize(SimpleXMLElement $element) {
$array = array();
foreach ($element->item as $item) {
$array[(string) $item['name']] = (string) $item;
}
return array('item' => $array);
}
}
What's left is to wire this into the SimpleXMLArraySerializer
on the right condition. E.g. depending on the name of the element:
...
/**
* @return SimpleXMLArraySerializeStrategy
*/
public function getStrategy() {
if ($this->subject->getName() === 'content') {
return new ContentXMLArraySerializeStrategy();
}
return $this->strategy;
}
}
Now the same example from above:
$obj = new SimpleXMLElement($xml);
$serializer = new SimpleXMLArraySerializer($obj);
print_r($serializer->arraySerialize());
would give you the wanted output (beautified):
Array
(
[item] => Array
(
[book] => book name
[price] => 35
)
)
As your XML probably only have this one element, I'd say such a level of abstraction might be a little much. However, if the XML is going to change and you have actually multiple array format needs within the same document, this is a plausible way to go.
The default serialization I've used in my example is based on the decoration example in SimpleXML and JSON Encode in PHP – Part III and End.
Upvotes: 1
Reputation: 76395
I took a quick look at the SimpleXMLElement
docs, which showed that it is actually quite easy to construct an array like you want:
$xml = simplexml_load_file($file, 'SimpleXMLElement', LIBXML_NOCDATA);
$result = array();//store assoc array here
foreach ($xml->item as $item)
{//iterate over item nodes
if (isset($item['name']))
{//attributes are accessible as array keys
$result[(string) $item['name']] = (string) $item;//casts required!
}
}
var_dump($result);
This is because the SimpleXMLElement
is a traversable object, so you can access its properties as though it were an array. However, we do need to cast the properties, because they're all instances of the SimpleXMLElement
class.
The code above is a simplified version of what I had written initially:
$xml = simplexml_load_file($fileName, 'SimpleXMLElement', LIBXML_NOCDATA);
foreach ($xml as $name => $node)
{
if ($name === 'item')
{
$key = false;
foreach ($node->attributes() as $name => $attr)
{
if ($name == 'name')
{
$key = (string) $attr;//attr is object, still
break;
}
}
if ($key !== false)
$result[$key] = (string) $node;
}
}
This works, too. However the code looks, I think you'll agree, quite messy. I'd stick to the first version I posted here...
Initial answer (using DOMDocument
)
I'll look into how to do this using simpleXML, but for now, here's how I'd set about the business of getting CDATA values using the DOMDocument
API:
$dom = new DOMDocument;
$dom->load($file);
//get items
$items = $dom->getElementsByTagName('item');
$cData = array();
foreach ($items as $node)
{
if ($node->hasChildNodes())
{
foreach ($node->childNodes as $cNode)
{
if ($cNode->nodeType === XML_CDATA_SECTION_NODE)
$cData[] = $cNode->textContent;//get contents
}
}
}
Use this in combination with other methods like $node->attributes->getNamedItem('name');
to get a node's attribute, $node->attributes->getNamedItem('name')->nodeValue;
to get that attribute's value.
I admit, the DOMDocument
api looks quite verbose (because it is), and it feels a bit clunky (as it has always done), but it's really not too difficult to figure out, once you've read the manual
Upvotes: 0