aborted
aborted

Reputation: 4541

How to convert XML string to PHP array with a different structure?

I have this method that converts an XML string into a PHP array with different keys and values to fully make sense of that XML appropriately. However, when there are multiple children of the same kind, I'm not getting the desired result from the array and I'm confused on how to alter the method to do so.

This is what the method looks like:

/**
 * Converts a XML string to an array
 *
 * @param $xmlString
 * @return array
 */
private function parseXml($xmlString)
{
    $doc = new DOMDocument;
    $doc->loadXML($xmlString);
    $root = $doc->documentElement;
    $output[$root->tagName] = $this->domnodeToArray($root, $doc);

    return $output;
}

/**
 * @param $node
 * @param $xmlDocument
 * @return array|string
 */
private function domNodeToArray($node, $xmlDocument)
{
    $output = [];
    switch ($node->nodeType)
    {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++)
            {
                $child = $node->childNodes->item($i);
                $v = $this->domNodeToArray($child, $xmlDocument);

                if (isset($child->tagName))
                {
                    $t = $child->tagName;

                    if (!isset($output['value'][$t]))
                    {
                        $output['value'][$t] = [];
                    }
                    $output['value'][$t][] = $v;
                }
                else if ($v || $v === '0')
                {
                    $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
                }
            }

            if (isset($output['value']) && $node->attributes->length && !is_array($output['value']))
            {
                $output = ['value' => $output['value']];
            }

            if (!$node->attributes->length && isset($output['value']) && !is_array($output['value']))
            {
                $output = ['attributes' => [], 'value' => $output['value']];
            }

            if ($node->attributes->length)
            {
                $a = [];
                foreach ($node->attributes as $attrName => $attrNode)
                {
                    $a[$attrName] = (string)$attrNode->value;
                }
                $output['attributes'] = $a;
            }
            else
            {
                $output['attributes'] = [];
            }

            if (isset($output['value']) && is_array($output['value']))
            {
                foreach ($output['value'] as $t => $v)
                {
                    if (is_array($v) && count($v) == 1 && $t != 'attributes')
                    {
                        $output['value'][$t] = $v[0];
                    }
                }
            }
            break;
    }

    return $output;
}

Here is some example XML:

<?xml version="1.0" encoding="UTF-8"?>
<characters>
   <character>
      <name2>Sno</name2>
      <friend-of>Pep</friend-of>
      <since>1950-10-04</since>
      <qualification>extroverted beagle</qualification>
   </character>
   <character>
      <name2>Pep</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-22</since>
      <qualification>bold, brash and tomboyish</qualification>
   </character>
</characters>

Running the method and passing that XML as its parameter, will result with this array:

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:1 [▼
      "character" => array:2 [▼
        0 => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "attributes" => []
              "value" => "Sno"
            ]
            "friend-of" => array:2 [▼
              "attributes" => []
              "value" => "Pep"
            ]
            "since" => array:2 [▼
              "attributes" => []
              "value" => "1950-10-04"
            ]
            "qualification" => array:2 [▼
              "attributes" => []
              "value" => "extroverted beagle"
            ]
          ]
          "attributes" => []
        ]
        1 => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "attributes" => []
              "value" => "Pep"
            ]
            "friend-of" => array:2 [▼
              "attributes" => []
              "value" => "Sno"
            ]
            "since" => array:2 [▼
              "attributes" => []
              "value" => "1966-08-22"
            ]
            "qualification" => array:2 [▼
              "attributes" => []
              "value" => "bold, brash and tomboyish"
            ]
          ]
          "attributes" => []
        ]
      ]
    ]
    "attributes" => []
  ]
]

What I want it to result to is (indentation could be wrong):

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:2 [▼
      0 => [
        "character" => array:1 [▼
            "value" => array:4 [▼
              "name2" => array:2 [▼
                  "attributes" => []
                  "value" => "Sno"
                ]
                "friend-of" => array:2 [▼
                  "attributes" => []
                  "value" => "Pep"
                ]
                "since" => array:2 [▼
                  "attributes" => []
                  "value" => "1950-10-04"
                ]
                "qualification" => array:2 [▼
                  "attributes" => []
                  "value" => "extroverted beagle"
                ]
              ]
              "attributes" => []
            ]
          ]
        ]
        1 => array:2 [▼
          "character" => array:1 [▼
            "value" => array:4 [▼
              "name2" => array:2 [▼
                "attributes" => []
                "value" => "Pep"
              ]
              "friend-of" => array:2 [▼
                "attributes" => []
                "value" => "Sno"
              ]
              "since" => array:2 [▼
                "attributes" => []
                "value" => "1966-08-22"
              ]
              "qualification" => array:2 [▼
                "attributes" => []
                "value" => "bold, brash and tomboyish"
              ]
            ]
            "attributes" => []
          ]
        ]
      ]
    ]
    "attributes" => []
  ]
]

So basically, I want the characters key's value key to be an array of two items, which basically includes the 2 character keys. This is only to happen if there are many of the same element on the same branch. The way it currently is, where the character key is an array with 2 elements doesn't work in my situation.

Altering the method above to reflect my needs hasn't been possible for me yet and I'm not sure what kind of approach I should take. Altering an array like this from a DOMDocument instance seems quite complicated.

Upvotes: 1

Views: 148

Answers (2)

Chin Leung
Chin Leung

Reputation: 14921

I've done some changes to your function but I'm not sure if this is what you need.

private function domNodeToArray($node, $xmlDocument)
{
    $output = ['value' => [], 'attributes' => []];

    switch ($node->nodeType) {
    case XML_CDATA_SECTION_NODE:
    case XML_TEXT_NODE:
        $output = trim($node->textContent);
        break;
    case XML_ELEMENT_NODE:
        for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++) {
            $child = $node->childNodes->item($i);
            $v = $this->domNodeToArray($child, $xmlDocument);

            if (isset($child->tagName)) {
                $t = $child->tagName;

                if (isset($output['value'][$t])) {
                    $output['value'][] = [$t => $output['value'][$t]];
                    $output['value'][] = [$t => $v];
                    unset($output['value'][$t]);
                } else {
                    $output['value'][$t] = $v;
                }
            } elseif (($v && is_string($v)) || $v === '0') {
                $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
            }
        }

        if ($node->attributes->length) {
            foreach ($node->attributes as $attrName => $attrNode) {
                $output['attributes'][$attrName] = (string) $attrNode->value;
            }
        }

        break;
    }

    return $output;
}

Output

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:2 [▼
      0 => array:1 [▼
        "character" => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "value" => "Sno"
              "attributes" => []
            ]
            "friend-of" => array:2 [▼
              "value" => "Pep"
              "attributes" => []
            ]
            "since" => array:2 [▼
              "value" => "1950-10-04"
              "attributes" => []
            ]
            "qualification" => array:2 [▼
              "value" => "extroverted beagle"
              "attributes" => []
            ]
          ]
          "attributes" => []
        ]
      ]
      1 => array:1 [▼
        "character" => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "value" => "Pep"
              "attributes" => []
            ]
            "friend-of" => array:2 [▼
              "value" => "Sno"
              "attributes" => []
            ]
            "since" => array:2 [▼
              "value" => "1966-08-22"
              "attributes" => []
            ]
            "qualification" => array:2 [▼
              "value" => "bold, brash and tomboyish"
              "attributes" => []
            ]
          ]
          "attributes" => []
        ]
      ]
    ]
    "attributes" => []
  ]
]

Upvotes: 1

Nigel Ren
Nigel Ren

Reputation: 57121

The problem is when to add in a new level and when to carry on with just adding the data. I've changed this logic, adding comments to the code to help understand what happens and when...

private function domNodeToArray($node, $xmlDocument)
{
    $output = [];
    switch ($node->nodeType)
    {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++)
            {
                $child = $node->childNodes->item($i);
                $v = $this->domNodeToArray($child, $xmlDocument);

                if (isset($child->tagName))
                {
                    $t = $child->tagName;

//                     if (!isset($output['value'][$t]))
//                     {
//                         $output['value'][$t] = [];
//                     }
                    // If the element already exists
                    if (isset($output['value'][$t]))
                    {
                        // Copy the existing value to new level
                        $output['value'][] = [$t => $output['value'][$t]];
                        // Add in new value
                        $output['value'][] = [$t => $v];
                        // Remove old element
                        unset($output['value'][$t]);
                    }
                    // If this has already been added at a new level
                    elseif ( isset($output['value'][0][$t]))   
                    {
                        // Add it to existing extra level
                        $output['value'][] = [$t => $v];
                    }
                    else    {
                        $output['value'][$t] = $v;
                    }
                }
                else if ($v || $v === '0')
                {
                    $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
                }
            }

            if (isset($output['value']) && $node->attributes->length && !is_array($output['value']))
            {
                $output = ['value' => $output['value']];
            }

            if (!$node->attributes->length && isset($output['value']) && !is_array($output['value']))
            {
                $output = ['attributes' => [], 'value' => $output['value']];
            }

            if ($node->attributes->length)
            {
                $a = [];
                foreach ($node->attributes as $attrName => $attrNode)
                {
                    $a[$attrName] = (string)$attrNode->value;
                }
                $output['attributes'] = $a;
            }
            else
            {
                $output['attributes'] = [];
            }
            break;
    }

    return $output;
}

I've tried it with...

<?xml version="1.0" encoding="UTF-8"?>
<characters>
   <character>
      <name2>Sno</name2>
      <friend-of>Pep</friend-of>
      <since>1950-10-04</since>
      <qualification>extroverted beagle</qualification>
   </character>
   <character>
      <name2>Pep</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-22</since>
      <qualification>bold, brash and tomboyish</qualification>
   </character>
   <character>
      <name2>Pep2</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-23</since>
      <qualification>boldish, brashish and tomboyish</qualification>
   </character>
</characters>

to check that the <character> elements are all added to the right level.

Upvotes: 1

Related Questions