Reputation: 12015

PHP xpath query on XML with default namespace binding

I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this.

Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is:

./xpeg "//MainType[@ID=123]"

What seems most strange is this line, without which my approach doesn’t work:

$result->loadXML($result->saveXML($result));

As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn’t be necessary.

Is there a better way to perform xpath queries on this XML in PHP?

XML (note the binding of the default namespace):

<?xml version="1.0" encoding="utf-8"?>
<MyRoot
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
 xmlns="http://www.example.com/data">
  <MainType ID="192" comment="Bob's site">
    <Price>$0.20</Price>
    <TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="123" comment="Test site">
    <Price>$99.95</Price>
    <TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="922" comment="Health Insurance">
    <Price>$600.00</Price>
    <TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="389" comment="Used Cars">
    <Price>$5000.00</Price>
    <TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
</MyRoot>

PHP CLI Script:

#!/usr/bin/php-cli
<?php

$xml = file_get_contents("xpeg.xml");

$domdoc = new DOMDocument();
$domdoc->loadXML($xml);

// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");

// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));

$xpath = new DOMXpath($domdoc);

$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
  dump_dom_levels($result);
}
else {
  echo "error\n";
}

// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
  $class = get_class($node);
  if ($class == "DOMNodeList") {
    echo "Level $level ($class): $node->length items\n";
    foreach ($node as $child_node) {
      dump_dom_levels($child_node, $level+1);
    }
  }
  else {
    $nChildren = 0;
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
        $nChildren++;
      }
    }
    if ($nChildren) {
      echo "Level $level ($class): $nChildren children\n";
    }
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
        dump_dom_levels($child_node, $level+1);
      }
    }
  }
}
?>

Upvotes: 6

Answers (4)

Tertium

Reputation: 6308

Also as a variant you may use a xpath mask:

//*[local-name(.) = 'MainType'][@ID='123']

Upvotes: 0

danorton

Reputation: 12015

Given the current state of the XPath language, I feel that the best answer is provided by Tomalek: to associate a prefix with the default namespace and to prefix all tag names. That’s the solution I intend to use in my current application.

When that’s not possible or practical, a better solution than my hack is to invoke a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument::normalizeDocument(). The method behaves “as if you saved and then loaded the document, putting the document in a ‘normal’ form.”

Upvotes: 0

Tomalak

Reputation: 338228

The solution is using the namespace, not getting rid of it.

$result = new DOMDocument();
$result->loadXML($xml);

$xpath = new DOMXpath($result);
$xpath->registerNamespace("x", trim($argv[2]));

$str = trim($argv[1]);
$result = $xpath->query($str);

And call it as this on the command line (note the x: in the XPath expression)

./xpeg "//x:MainType[@ID=123]" "http://www.example.com/data"

You can make this more shiny by

finding out default namespaces yourself (by looking at the namespace property of the document element)
supporting more than one namespace on the command line and register them all before $xpath->query()
supporting arguments in the form of xyz=http//namespace.uri/ to create custom namespace prefixes

Bottom line is: In XPath you can't query //foo when you really mean //namespace:foo. These are fundamentally different and therefore select different nodes. The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath.

Upvotes: 13

cwallenpoole

Reputation: 82028

Just out of curiosity, what happens if you remove this line?

$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");

That strikes me as the most likely to cause the need for your hack. You're basically removing the xmlns="http://www.example.com/data" part and then re-building the DOMDocument. Have you considered simply using string functions to remove that namespace?

$pieces = explode('xmlns="', $xml);
$xml = $pieces[0] . substr($pieces[1], strpos($pieces[1], '"') + 1);

Then continue on your way? It might even end up being faster.

Upvotes: 1

PHP xpath query on XML with default namespace binding

Answers (4)

Related Questions