Joe
Joe

Reputation: 245

XMLs parsing PHP

I need to parse this XML file which has some custom tags, as it is shown here:

    <?xml version="1.0" encoding="utf-8"?>
<glz:Config xmlns:glz="http://www.glizy.org/dtd/1.0/">
    <glz:Import src="config.xml" />

    <glz:Group name="thumbnail">
        <glz:Param name="width" value="200" />
        <glz:Param name="height" value="*" />
    </glz:Group>
</glz:Config>

When it gets to the tag <glz:Import src="config.xml" /> it needs to parse the file config.xml which contains as follow:

    <?xml version="1.0" encoding="utf-8"?>
<glz:Config xmlns:glz="http://www.glizy.org/dtd/1.0/">
    <glz:Group name="folder">
        <glz:Param name="width" value="100" />
        <glz:Param name="height" value="200" />
    </glz:Group>
</glz:Config>

The final result should be an array like this. It contains the values of both parsed files:

$result['thumbnail/width'] = 200;
$result['thumbnail/height'] = '*';
$result['folder/width'] = 100;
$result['folder/height'] = 200;

This is how I managed the parsing of the XML. My problem is that I do not know how to merge the new results with the already (old) parsed ones. Here you can see my code:

function parseFile(){
            $reader = new XMLReader;
            $reader->open($this->fileName);

            while ($reader->read()){
                if ($reader->name == 'glz:Group')
                {
                    $groupName = $reader->getAttribute('name');
                    $reader->read();
                    $reader->read();

                    while ($reader->name == 'glz:Param')
                    {
                        if (strpos($reader->getAttribute('name'),'[]')  == true)
                        {
                            $arrayGroupName = substr($reader->getAttribute('name'), 0, -2);
                            if(empty($filters[$groupName.'/'.$arrayGroupName]))
                            {
                                $filters[$groupName.'/'.$arrayGroupName] = array();
                                array_push($filters[$groupName.'/'.$arrayGroupName],$this->castValue($reader->getAttribute('value')));
                                $this->result[$groupName."/".$arrayGroupName] = $filters[$groupName.'/'.$arrayGroupName];
                            }
                            else
                            {
                                array_push($filters[$groupName.'/'.$arrayGroupName],$this->castValue($reader->getAttribute('value')));
                                $this->result[$groupName."/".$arrayGroupName] = $filters[$groupName.'/'.$arrayGroupName];
                            }
                        }
                        else
                        {
                            $this->result[$groupName."/".$reader->getAttribute('name')] = $this->castValue($reader->getAttribute('value'));
                        }
                        $reader->read();
                        $reader->read();
                    }
                }
                else if ($reader->name == 'glz:Param')
                {
                    if (strpos($reader->getAttribute('name'),'[]')  == true)
                    {
                        $arrayGroupName = substr($reader->getAttribute('name'), 0, -2);
                        if(empty($filters[$arrayGroupName]))
                        {
                            $filters[$arrayGroupName] = array();
                            array_push($filters[$arrayGroupName],$this->castValue($reader->getAttribute('value')));
                            $this->result[$$arrayGroupName] = $filters[$arrayGroupName];
                        }
                        else
                        {
                            array_push($filters[$arrayGroupName],$this->castValue($reader->getAttribute('value')));
                            $this->result[$arrayGroupName] = $filters[$arrayGroupName];
                        }
                    }
                    else
                    {
                        $this->result[$reader->getAttribute('name')] = $this->castValue($reader->getAttribute('value'));
                    }
                }
                else if ($reader->name == 'glz:Import')
                {
                    $file = $reader->getAttribute('src');
                    $newConfig = new Config($file);
                    $newConfig->parseFile();
                }
            }
            return $this->result;

        }

How can I merge, everytime, the result I get from parsing the file when I find the tag ?

Thank you so much!

Upvotes: 0

Views: 84

Answers (2)

ThW
ThW

Reputation: 19512

You need to put the read logic into a function with the filename as an argument, so that it can call itself if it finds an Import element. Let the function return the values as an array and merge the results.

In DOM this is less complex:

function readConfigurationFile($fileName) {
  $document = new DOMDocument();
  $document->load($fileName);
  $xpath = new DOMXpath($document);
  $xpath->registerNamespace('g', 'http://www.glizy.org/dtd/1.0/');

  $result = [];
  foreach ($xpath->evaluate('/g:Config/*[self::g:Import or self::g:Group]') as $node) {
    switch ($node->localName) {
    case 'Import' :
      $result = array_merge($result, readConfigurationfile($node->getAttribute('src')));
      break;
    case 'Group' :
      $groupName = $node->getAttribute('name'); 
      foreach ($xpath->evaluate('g:Param', $node) as $paramNode) {
        $result[
          sprintf('%s/%s', $groupName, $paramNode->getAttribute('name'))
        ] = $paramNode->getAttribute('value');
      } 
      break;
    }
  }
  return $result;
}

var_dump(readConfigurationFile('main.xml'));

Output:

array(4) {
  ["folder/width"]=>
  string(3) "100"
  ["folder/height"]=>
  string(3) "200"
  ["thumbnail/width"]=>
  string(3) "200"
  ["thumbnail/height"]=>
  string(1) "*"
}

The approach is the same in XMLReader, but a little more complex.

function readLargeConfigurationFile($fileName) {

  $reader = new XMLReader();
  $reader->open($fileName);

  $xmlns = 'http://www.glizy.org/dtd/1.0/';
  $document = new DOMDocument();
  $xpath = new DOMXpath($document);
  $xpath->registerNamespace('g', $xmlns);

  $result = [];

  // find the first Import or Group in the namespace
  do {
    $found = $reader->read();
  } while(
    $found && 
    !(
       $reader->namespaceURI === $xmlns && 
       ($reader->localName === 'Import' || $reader->localName === 'Group')
    )
  );

  while ($found) {
    switch ($reader->localName) {
    case 'Import' :
      $result = array_merge($result, readLargeConfigurationFile($reader->getAttribute('src')));
      break;
    case 'Group' :
      // expand Group into DOM for easier access
      $groupNode = $reader->expand($document);
      $groupName = $groupNode->getAttribute('name'); 
      foreach ($xpath->evaluate('g:Param', $groupNode) as $paramNode) {
        // read a Param
        $result[
          sprintf('%s/%s', $groupName, $paramNode->getAttribute('name'))
        ] = $paramNode->getAttribute('value');
      } 
      break;
    }

    // iterate sibling nodes to find the next Import or Group
    do {
      $found = $reader->next();
    } while(
      $found && 
      !(
        $reader->namespaceURI === $xmlns && 
        ($reader->localName === 'Import' || $reader->localName === 'Group')
      )
    ); 
  } 
  return $result;
}

var_dump(readLargeConfigurationFile('main.xml'));

Notice that the example does not use the $name property. It contains the namespace alias/prefix glz. Namespace prefixes are optional and can change - even in a single document. Use the $localName and $namespaceURI properties.

With XMLReader::expand() you can expand the current node into DOM. A typical approach is to iterate only the outer nodes with XML reader. If you know that a node and its descendants is small enough you expand them into DOM for easier access.

Upvotes: 1

lpp
lpp

Reputation: 26

As far as I understand your question you need to refactor your code a bit.

Rewrite the parser function without references to $this->result and $this->fileName.

Redeclare those vars within your function as $result and $fileName. add $fileName as a function argument.

Add another variable $result_config within the function.

when you read the config tag, call the function recursively instead of creating a new class:

 -$file = $reader->getAttribute('src');
 - $newConfig = new Config();

 + $file = $reader->getAttribute('src');
 + $result_config = $this->parseFile($file);

Then finally merge the two result after you're done with both files:

if ($result_config) {
    $this->result = array_merge($result_config, $this->result);
}
return $this->result;

Upvotes: 1

Related Questions